Key Concepts
Understanding these core concepts will help you get the most out of ZenSearch.
Data Sources & Connectors
Connectors
A connector is a configured connection to an external data source. ZenSearch supports 17+ connector types:
- Cloud Storage: S3, Google Drive, SharePoint, Azure Blob
- Collaboration Tools: Confluence, Notion, Slack
- Development Tools: GitHub, Jira
- CRM Systems: Salesforce, HubSpot, SAP
- Databases: PostgreSQL, MySQL, ClickHouse, MS SQL
- Web: Web Crawler
Each connector:
- Authenticates with your data source
- Syncs content on a schedule or via webhooks
- Maintains permissions from the source platform
Collections
A collection is a logical grouping of documents from one or more connectors. Collections help you:
- Organize content by topic, department, or project
- Control which content is searched
- Apply different embedding models
- Manage access permissions
Example setup:
Engineering Collection
├── GitHub (code repositories)
├── Confluence (technical docs)
└── Jira (tickets and issues)
Sales Collection
├── Salesforce (CRM data)
├── Google Drive (presentations)
└── HubSpot (marketing content)
Documents & Semantic Units
Documents
A document represents a single piece of content from a data source - a file, page, message, or record. Documents are:
- Parsed to extract text and metadata
- Classified by type and content
- Indexed for retrieval
Semantic Units (SUs)
ZenSearch breaks documents into Semantic Units - meaningful chunks of content optimized for AI retrieval. This process:
- Segments content into logical sections
- Preserves context and relationships
- Generates embeddings for semantic search
- Maintains links to source documents
Search & Retrieval
Hybrid Search
ZenSearch uses hybrid search combining:
- Dense embeddings: Semantic understanding of meaning
- Sparse embeddings: Keyword matching for precision
- Fusion algorithms: Combining results for best accuracy
Search Modes
| Mode | Description | Best For |
|---|---|---|
| Chat | Conversational AI with streaming responses | Questions, research, exploration |
| Search | Traditional search results with faceted filtering | Finding specific documents |
Faceted Search
Filter results by:
- Topics/Categories: Auto-extracted document topics
- Departments: Organizational categories
- Languages: Document language
- Date Ranges: When content was created/modified
- Sentiment: Positive, neutral, or negative content
AI Agents
What are Agents?
Agents are AI-powered assistants that can:
- Execute multi-step research tasks
- Use tools to search, query, and analyze
- Maintain conversation context
- Provide comprehensive answers
Agent Tools
Built-in tools available to agents:
| Tool | Description |
|---|---|
search_documents | Search across collections |
get_document | Retrieve full document content |
summarize_document | Generate document summaries |
search_database_schema | Discover database structure |
query_database | Execute read-only SQL queries |
get_table_info | Get table columns and types |
search_knowledge_graph | Find entity relationships |
calculate | Perform calculations |
Agent Modes
- Auto: Automatically uses agent for complex queries
- Research: Always uses agent with planning
- Off: Direct chat without agent capabilities
Permissions & Access Control
Team Roles
| Role | Capabilities |
|---|---|
| Owner | Full control, delete team, transfer ownership |
| Admin | Manage members, connectors, collections |
| Editor | Create/edit connectors, run sync jobs |
| Viewer | Read-only, search and chat |
Document-Level Permissions
ZenSearch syncs permissions from source platforms:
- User permissions: Individual access rights
- Group permissions: Team or group access
- Domain permissions: Organization-wide access
- Public access: Anyone can view
Permissions are enforced at search time - users only see content they're authorized to access.
Processing Pipeline
When you connect a data source, content flows through:
Collection → Parsing → Structure Analysis → Projection → Vectorization → Classification
- Collection: Fetches content from source
- Parsing: Extracts text and metadata
- Structure Analysis: Identifies document structure
- Projection: Creates semantic units
- Vectorization: Generates embeddings
- Classification: Categorizes content
Guardrails & Safety
ZenSearch includes built-in safety features:
Input Guardrails
- Content moderation
- Prompt injection detection
- PII detection
- Length validation
Output Guardrails
- Hallucination detection
- Toxicity filtering
- Relevance checking
Next Steps
Now that you understand the key concepts:
- Your First Search - Practice searching effectively
- Core Features - Explore the full feature set
- Connect More Sources - Add additional data sources