Self-Hosting Guide
Deploy ZenSearch on your own infrastructure with full control over your data, AI models, and network configuration. The free Developer Edition is available for evaluation and small teams. Enterprise On-Premise deployment is available for production workloads.
The Developer Edition is a free, self-contained Docker Compose deployment — no license key required. Jump to quickstart.
For production deployments with Kubernetes, SSO, and dedicated support, contact [email protected].
Overview
ZenSearch's on-premise deployment gives you:
- Full data sovereignty — your data never leaves your network
- Air-gapped support — deploy without internet connectivity
- Bring your own LLM — use OpenAI, Anthropic, Cohere, Ollama, or any OpenAI-compatible endpoint
- Infrastructure control — deploy on AWS, GCP, Azure, or bare metal
- Custom networking — configure VPCs, firewalls, and service mesh as needed
Developer Edition Quickstart
The Developer Edition is a free, self-contained Docker Compose deployment for evaluation and development.
Prerequisites
- Docker Desktop (or Docker Engine + Docker Compose v2)
- 8 GB RAM available for Docker
- An OpenAI API key
Install
curl -fsSL https://releases.zensearch.ai/install.sh | bash
The installer downloads the latest release, prompts for your OpenAI API key, and starts all services.
After a few minutes: Web UI at http://localhost:5173, API health at http://localhost:8080/health.
Manual Install
If you prefer not to pipe to bash:
curl -fsSL https://releases.zensearch.ai/developer-edition/latest \
-o zensearch-dev.tar.gz
tar xzf zensearch-dev.tar.gz
cd zensearch-dev-edition-*
cp .env.lite.example .env
# Edit .env and set OPENAI_API_KEY
./start.sh
Management
./start.sh --down # Stop all services
./start.sh --update # Pull latest images and restart
./diagnose.sh # Health check and diagnostics
What's Included
| Component | Description |
|---|---|
| Core API | REST API, search, chat, agents |
| Model Gateway | AI model proxy |
| Web UI | React frontend |
| Parser | Document parsing |
| Structure Analyzer | Document structure extraction |
| Projector | Projection generation |
| Vectorizer | Embedding generation |
| PostgreSQL | Database |
| Redis | Cache |
| Qdrant | Vector search |
| MinIO | Object storage |
| NATS | Message broker |
| S3 Collector | Amazon S3 / S3-compatible storage connector |
| Web Crawler | Website crawling with headless Chrome |
GPU Acceleration
If you have an NVIDIA GPU with the NVIDIA Container Toolkit installed, the installer automatically detects it and uses GPU-accelerated document parsing. No manual configuration needed.
Configuration
Resource limits and other settings can be tuned in your .env file:
| Variable | Default | Description |
|---|---|---|
PARSER_MEMORY_LIMIT | 4G | Memory limit for the document parser. Increase for large PDFs on CPU |
Not Included (Enterprise Features)
- Reranker, sparse embedder
- Additional data source connectors (Confluence, Slack, GitHub, Jira, Notion, Google Drive, SharePoint, Azure Blob, Salesforce, SAP, HubSpot)
- Monitoring stack (Prometheus, Grafana)
- SAML authentication (OIDC is supported — see Enabling Authentication)
- Kubernetes / Helm chart deployment
- Dedicated support
Enabling Authentication
By default, the Developer Edition runs with AUTH_MODE=none — no login required, all users share a single dev identity. This is convenient for evaluation but not suitable for shared or network-accessible deployments.
To enable authentication, configure any OIDC-compatible identity provider (Keycloak, Auth0, Okta, Google, Azure AD, etc.):
- Edit
.envin your ZenSearch directory:
# Change auth mode
AUTH_MODE=oidc
# OIDC provider settings (example: Keycloak)
OIDC_ISSUER_URL=https://your-keycloak.example.com/realms/zensearch
OIDC_CLIENT_ID=zensearch-api
OIDC_CLIENT_SECRET=your-client-secret
- Restart:
./start.sh --down && ./start.sh
The Web UI will now show a login screen. Users are automatically provisioned on first login with their OIDC identity (email, name, roles).
Provider examples:
| Provider | Issuer URL |
|---|---|
| Keycloak | https://keycloak.example.com/realms/your-realm |
| Auth0 | https://your-tenant.auth0.com/ |
| Okta | https://your-org.okta.com/oauth2/default |
https://accounts.google.com | |
| Azure AD | https://login.microsoftonline.com/{tenant-id}/v2.0 |
When configuring your OIDC provider, set the redirect URI to http://localhost:5173/auth/callback (or your custom domain). The client must support the Authorization Code flow with PKCE.
Architecture
ZenSearch deploys as a set of containerized services organized into three layers:
- Application Layer — The core platform services that handle search, chat, agents, document processing, AI model routing, and the web interface
- Infrastructure Layer — Databases, caching, object storage, and messaging used by the application services
- Monitoring Layer (optional) — Metrics, dashboards, log aggregation, and alerting
Data Source Connectors
Connectors are deployed selectively — only enable the ones for data sources your organization uses. Each connector runs independently and can be added or removed without affecting the rest of the platform.
ZenSearch supports 13 connector types: S3, GitHub, Confluence, Jira, Slack, Notion, Google Drive, SharePoint, Azure Blob, Web Crawler, Salesforce, SAP, and HubSpot.
Prerequisites
Hardware Requirements
Minimum (small team, < 10,000 documents):
- 8 CPU cores
- 16 GB RAM
- 100 GB SSD storage
- No GPU required (uses cloud LLM APIs)
Recommended (medium team, 10,000–100,000 documents):
- 16 CPU cores
- 32 GB RAM
- 500 GB SSD storage
- Optional: NVIDIA GPU for local document parsing
Large scale (100,000+ documents):
- 32+ CPU cores
- 64+ GB RAM
- 1 TB+ SSD storage
- NVIDIA GPU recommended for local parsing
- Consider running infrastructure on dedicated nodes
Software Requirements
- Docker 24.0+ with Docker Compose v2
- OpenSSL (for generating encryption keys and TLS certificates)
For Kubernetes deployments:
- Kubernetes 1.28+
- Helm 3.x
- kubectl configured for your cluster
Deployment Options
Docker Compose
Single-machine deployment for evaluation and small-to-medium teams. The platform starts with a single command and includes all application services, infrastructure, and your selected connectors.
Best for: Teams of up to ~50 users, evaluation, development, and staging environments.
Kubernetes
Production deployment with horizontal scaling, health checks, rolling updates, and high availability. Helm charts are provided.
Best for: Large teams, production workloads, organizations with existing Kubernetes infrastructure.
Air-Gapped Deployment
For environments with no internet access:
- All container images can be pre-loaded from an internet-connected staging machine
- Use a self-hosted LLM (Ollama, vLLM, or any OpenAI-compatible server) instead of cloud providers
- Pre-download ML models for local inference
- No internet access required after initial setup
Configuration
AI Models
ZenSearch supports multiple AI providers. You can configure which provider and models to use for chat, agents, and embeddings:
- Cloud providers — OpenAI, Anthropic, Cohere, Groq
- Self-hosted models — Ollama, vLLM, or any OpenAI-compatible API endpoint
- Mix and match — Use cloud models for some tasks and local models for others
Embedding models can be configured separately from chat models to optimize cost and performance.
Authentication
ZenSearch integrates with your existing identity provider:
- OIDC — Keycloak, Auth0, Okta, Azure AD, and other OIDC-compliant providers
- SAML — Enterprise SSO
- Clerk — Managed authentication service
Connectors
Deploy only the connectors your organization needs. Each connector is configured with credentials for the target data source and can be enabled, paused, or removed at any time through the dashboard.
Guardrails
Guardrails are configured per-team through the dashboard. Features include:
- Prompt injection detection
- PII detection and filtering
- Hallucination detection (lexical, semantic, hybrid)
- Toxicity filtering
- Content moderation
See the Guardrails documentation for configuration details.
Security
Encryption
- All stored credentials (API keys, OAuth tokens) are encrypted at rest
- TLS is supported for all inter-service communication in production
- Database connections support SSL/TLS
Network Security
- AI model routing is internal-only and never exposed externally
- Use a reverse proxy (Nginx, Traefik, Caddy) to terminate TLS for the web UI and API
- Configure CORS to only allow your production domain
- Internal services communicate on an isolated network
Access Control
- Role-based access control (Owner, Admin, Editor, Viewer)
- Document-level permissions synced from source platforms
- Search-time permission enforcement — users only see content they're authorized to access
GPU Support
The Developer Edition installer automatically detects NVIDIA GPUs and uses GPU-accelerated document parsing when available. This significantly speeds up document processing for large volumes.
Requirements
- NVIDIA GPU with CUDA support and 4 GB+ VRAM (8 GB recommended)
- NVIDIA drivers installed (
nvidia-smishould work) - NVIDIA Container Toolkit configured for Docker
Installing NVIDIA Container Toolkit
If the installer shows "NVIDIA GPU found but Docker NVIDIA runtime not configured", install the toolkit:
Ubuntu / Debian:
# Add NVIDIA container toolkit repository
curl -fsSL https://nvidia.github.io/libc-nvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libc-nvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install and configure
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
RHEL / CentOS / Fedora:
curl -s -L https://nvidia.github.io/libc-nvidia-container/stable/rpm/nvidia-container-toolkit.repo \
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify GPU Access
# Should show your GPU info
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi
Re-run the Installer
After installing the toolkit, re-run the installer. It will detect the GPU and automatically use the GPU-accelerated parser image:
./start.sh --down
rm .env
./start.sh
How It Works
When a compatible GPU is detected, the installer:
- Writes
PARSER_GPU_ENABLED=trueto.env - Applies
docker-compose.lite.gpu.ymloverlay which switches to theparser-gpuimage - Adds NVIDIA device reservation and increases parser memory to 4 GB
- Enables a persistent model cache volume (models are downloaded on first run)
Upgrading
ZenSearch releases are delivered as updated container images. The upgrade process:
- Pull the latest images
- Database migrations run automatically on startup
- Restart services
- Verify health via the dashboard
Zero-downtime upgrades are supported on Kubernetes deployments.
Enterprise Getting Started
Enterprise customers receive:
- License key for on-premise deployment
- Private deployment guide with step-by-step instructions
- Container registry access for all platform images
- Dedicated support from our engineering team
Contact [email protected] to discuss your on-premise deployment requirements.