Self-Hosting Guide

Deploy ZenSearch on your own infrastructure with full control over your data, AI models, and network configuration. The free Developer Edition is available for evaluation and small teams. Enterprise On-Premise deployment is available for production workloads.

:::tip Developer Edition The Developer Edition is a free, self-contained Docker Compose deployment — no license key required. Jump to quickstart.

For production deployments with Kubernetes, SSO, and dedicated support, contact [email protected]. :::

Overview

ZenSearch's on-premise deployment gives you:

Full data sovereignty — your data never leaves your network
Air-gapped support — deploy without internet connectivity
Bring your own LLM — use OpenAI, Anthropic, Cohere, Ollama, or any OpenAI-compatible endpoint
Infrastructure control — deploy on AWS, GCP, Azure, or bare metal
Custom networking — configure VPCs, firewalls, and service mesh as needed

Developer Edition Quickstart

The Developer Edition is a free, self-contained Docker Compose deployment for evaluation and development.

Prerequisites

Docker Desktop (or Docker Engine + Docker Compose v2)
8 GB RAM minimum, 16 GB+ recommended
linux/amd64 or linux/arm64 host — Apple Silicon (macOS Docker Desktop), AWS Graviton, Qualcomm Snapdragon X via WSL2, and Raspberry Pi 4/5 with 64-bit OS are all supported natively. 32-bit ARM (armv7) is not supported.
No API key required — the installer can auto-install Ollama and pull local models sized for your hardware. Bring your own LLM (OpenAI, Anthropic, Groq, Ollama already installed, LM Studio, or any OpenAI-compatible endpoint) is also supported.

Install

curl -fsSL https://releases.zensearch.ai/install.sh | bash

The installer downloads the latest release, prompts you to pick an AI provider, and starts all services. If you pick the default "Local Setup" option it auto-installs Ollama and pulls chat + embedding models sized for your available RAM (see Recommended Ollama models).

After a few minutes: Web UI at http://localhost:35173, API health at http://localhost:38080/health.

note

The Developer Edition binds the Web UI and API to 127.0.0.1 on high-numbered host ports (35173 and 38080) to avoid collisions with other dev tools (React dev servers on 5173, Rails on 3000, etc.). Override WEB_PORT / API_PORT in your .env if you want different ports. Set LITE_BIND_ADDR=0.0.0.0 only on a trusted network after enabling authentication.

Recommended Ollama models (Local Setup)

The installer detects total RAM and GPU VRAM (via nvidia-smi or rocm-smi, independent of Docker NVIDIA toolkit) and picks a chat + embedding pair that leaves real headroom for the OS, Docker, and the ZenSearch stack. All chat picks come from the qwen3.5 family — the April 2026 default with a full size ladder (0.8b → 122b) and confirmed tools + thinking + vision support.

When a usable GPU is present, weights and KV cache live in VRAM, so a modest-RAM machine with a 16 GB GPU can run qwen3.5:9b @ 32K context comfortably. Without a GPU, the picker falls back to a RAM-only ladder — CPU inference is slow regardless of model size, so we pick what fits without forcing the user into swap.

GPU-first ladder (when ≥ 8 GB VRAM detected):

GPU VRAM	Chat	Context	Embedding
≥ 48 GB	`qwen3.5:35b` (22 GB)	32K	`mxbai-embed-large` (670 MB)
≥ 24 GB	`qwen3.5:27b` (17 GB)	16K	`mxbai-embed-large` (670 MB)
≥ 16 GB	`qwen3.5:9b` (5.5 GB)	32K	`mxbai-embed-large` (670 MB)
≥ 12 GB	`qwen3.5:9b` (5.5 GB)	16K	`mxbai-embed-large` (670 MB)
≥ 8 GB	`qwen3.5:4b` (2.5 GB)	16K	`nomic-embed-text` (274 MB)

RAM-only ladder (no GPU / Apple Silicon — unified memory):

Total RAM	Chat	Context	Embedding
≥ 64 GB	`qwen3.5:27b` (17 GB)	16K	`mxbai-embed-large` (670 MB)
32 – 64 GB	`qwen3.5:9b` (5.5 GB)	16K	`mxbai-embed-large` (670 MB)
16 – 32 GB	`qwen3.5:4b` (2.5 GB)	16K	`nomic-embed-text` (274 MB)
8 – 16 GB	`qwen3.5:4b` (2.5 GB)	8K	`nomic-embed-text` (274 MB)
< 8 GB	`qwen3.5:2b` (1.4 GB)	8K	`granite-embedding:30m` (63 MB)

Thresholds are conservative on purpose. The ZenSearch stack (Postgres + Qdrant + RustFS + NATS + Redis + core-api + model-gw + vectorizer + parser + structure-analyzer + projector) plus Docker Desktop typically consumes 11 – 19 GB of memory before any model loads. The VRAM thresholds are also slightly below nominal GiB values (e.g. 16,000 MiB for the "16 GB" tier rather than 16,384) because modern NVIDIA drivers reserve 1–2% of VRAM — an RTX 5070 Ti advertised as 16 GB reports ~16,303 MiB, and an exact 2^N threshold would silently disqualify it.

You can override the picks at install time:

LLM_PROVIDER=ollama \
LLM_CHAT_MODEL=qwen3.5:9b \
LLM_EMBED_MODEL=nomic-embed-text \
  ./install.sh --yes

Or post-install by editing LLM_CHAT_MODEL / LLM_EMBED_MODEL in .env and restarting.

Custom `zensearch-chat` Ollama tag

Ollama's default num_ctx is 4096 tokens regardless of the model's actual capability — too small for the agent path, where the system prompt + tool definitions + conversation history routinely exceed 4K within a few turns. Without intervention, the agent silently truncates older context.

The installer creates a custom Ollama tag named zensearch-chat that wraps the picked base model with a tier-appropriate num_ctx (8K / 16K / 32K). LLM_CHAT_MODEL and LLM_AGENT_MODEL in .env are written to point at this tag, not the base model. Running ollama run qwen3.5:9b directly outside ZenSearch still gets the 4K default — only the wrapping tag has the bumped context. This avoids inflating the KV cache for unrelated models the user may have pulled.

Re-running the installer detects an existing zensearch-chat tag and:

Skips recreation when both the FROM base model and num_ctx already match (idempotent — the common case for a --update run).
Warns and recreates when either differs. A common trigger: GPU upgrade lands you in a different VRAM tier whose CONTEXT_TOKENS happens to match but whose base model changed (e.g. 8 GB → 12 GB both pick 16K but the chat model changes from qwen3.5:4b to qwen3.5:9b).
Skips entirely when LLM_BASE_URL points at a remote Ollama host (anything other than host.docker.internal / localhost / 127.0.0.1). Local ollama create runs against the local CLI's default and would put the tag on the wrong server. The installer prints a manual recipe to run on the remote host:

# On the remote Ollama host
printf 'FROM qwen3.5:9b\nPARAMETER num_ctx 32768\n' | ollama create zensearch-chat -f -

# Then in your ZenSearch .env
LLM_CHAT_MODEL=zensearch-chat
LLM_AGENT_MODEL=zensearch-chat

Chat performance flags

The installer classifies your AI backend into one of three tiers and writes sensible performance defaults to .env:

Backend class	Triggered by	`CHAT_QUERY_REWRITE_ENABLED`	`CHAT_FOLLOWUP_SUGGESTIONS_ENABLED`	`CHAT_CLASSIFICATION_ENABLED`
hosted	`LLM_PROVIDER` = `openai` / `anthropic` / `groq` / `openrouter`	`true`	`true`	`true`
fast_local	NVIDIA GPU with `nvidia-smi` OR Apple Silicon with ≥ 32 GB unified memory	`true`	`true`	`true`
slow_local	Apple Silicon < 32 GB, CPU-only Linux, Ollama without GPU	`false`	`false`	`false`

Each flag gates an optional LLM call on the chat critical path:

CHAT_QUERY_REWRITE_ENABLED — rewrites conversational follow-ups ("and what about security?") into self-contained queries via a cheap LLM call before retrieval. On hosted this takes ~5s and meaningfully improves follow-up answer quality. On memory-constrained Ollama it takes 60–180s of silence before the synthesis stream even starts.
CHAT_FOLLOWUP_SUGGESTIONS_ENABLED — after the main answer streams, fires a second LLM call (5s timeout) to generate 3 "ask next" chips for the UI. Pure UX polish; costs a 5–30s tail on slow backends.
CHAT_CLASSIFICATION_ENABLED — runs the background classification worker that extracts department/category/topic metadata from every indexed document. On single-slot local inference, classification batches compete with user chat for the one Ollama slot. When disabled, search still works; only facet filtering gets less granular.

All three flags are independently toggleable. To re-enable a feature after install, edit the value in .env and restart with ./scripts/install.sh --update.

Apple Silicon MLX

Ollama 0.19+ ships an MLX backend that is ~2× faster than its Metal backend on Apple Silicon, but it only auto-activates on Macs with ≥ 32 GB unified memory. On smaller Macs (16 GB, 24 GB) Ollama falls back to the slower Metal path, and qwen3.5:4b chat completions routinely take 60–180s per turn. There is no environment variable to force-enable the MLX backend below the 32 GB threshold.

For best performance on < 32 GB Apple Silicon, run MLX directly instead of Ollama:

# Install mlx-lm or mlx-openai-server (Python)
uv tool install mlx-openai-server

# Pull a pre-quantized MLX weight (different format from Ollama GGUF)
# Community weights: https://huggingface.co/mlx-community
mlx-openai-server \
  --model mlx-community/Qwen3-4B-Instruct-4bit \
  --port 8080 \
  --max-tokens 8192 \
  --context-size 32768

Then configure ZenSearch to talk to the MLX endpoint:

# In .env
LLM_PROVIDER=custom
LLM_BASE_URL=http://host.docker.internal:8080/v1
LLM_CHAT_MODEL=Qwen3-4B-Instruct-4bit
# Keep Ollama (or another embedding server) for embeddings — MLX server
# embedding support varies by implementation; validate before relying on it.
LLM_EMBED_PROVIDER=ollama
LLM_EMBED_BASE_URL=http://host.docker.internal:11434
LLM_EMBED_MODEL=nomic-embed-text

Recommended MLX servers (all expose OpenAI-compatible /v1/chat/completions):

mlx-openai-server — qwen3 tool-call parser included; works with ZenSearch agent mode
vllm-mlx — continuous batching, claims 400+ tok/s on Apple Silicon
mlx_lm.server — the official reference implementation; simpler but fewer tool-call parsers

Tool-calling reliability on MLX follows the same model-family rules as Ollama: qwen3 / qwen3.5 work, gemma3 is broken (upstream parser bugs — mlx-lm#1096, ollama#14493). gemma4 may have fixed the template bugs but is unverified against the ZenSearch agent tool set, so qwen3.5 stays the safe default. Stick with qwen variants for agent mode regardless of backend.

note

ZenSearch is an enterprise platform. Running it on sub-32 GB Apple Silicon is supported for evaluation but is not the target deployment. Production workloads should use either a hosted LLM provider, an NVIDIA GPU host, or Apple Silicon with ≥ 32 GB unified memory (where Ollama's MLX backend auto-activates).

Manual Install

If you prefer not to pipe to bash:

curl -fsSL https://releases.zensearch.ai/developer-edition/latest \
  -o zensearch-dev.tar.gz
tar xzf zensearch-dev.tar.gz
cd zensearch-dev-edition-*
cp .env.lite.example .env
# Edit .env — set LLM_PROVIDER and either LLM_API_KEY (for hosted
# providers like openai/anthropic/groq) or LLM_BASE_URL (for ollama/
# lmstudio/custom). See .env.lite.example for the full option list.
./start.sh

Management

./start.sh --down         # Stop all services
./start.sh --update       # Pull latest images and restart
./diagnose.sh             # Health check and diagnostics

What's Included

Component	Description
Core API	REST API, search, chat, agents
Model Gateway	AI model proxy
Web UI	React frontend
Parser (Lite)	Lightweight document parsing (PDF, DOCX, PPTX, XLSX — no OCR)
Structure Analyzer (Base)	Document structure extraction
Projector	Projection generation
Vectorizer	Embedding generation
PostgreSQL	Database
Redis	Cache
Qdrant	Vector search
RustFS	Object storage
NATS	Message broker
S3 Collector	Amazon S3 / S3-compatible storage connector
Web Crawler	Website crawling with headless Chrome

GPU Acceleration

If you have an NVIDIA GPU with the NVIDIA Container Toolkit installed, the installer automatically detects it and uses GPU-accelerated document parsing. No manual configuration needed.

Configuration

Resource limits and other settings can be tuned in your .env file:

Variable	Default	Description
`PARSER_MEMORY_LIMIT`	`4G`	Memory limit for the document parser. Increase for large PDFs on CPU

Not Included (Enterprise Features)

Full parser with OCR/GPU support (lite edition uses lightweight CPU parsing — see Parser backends)
Structure-Analyzer [full] extra (sentence-transformers + tree-sitter for advanced semantic extraction)
Reranker, sparse embedder
Additional data source connectors (Confluence, Slack, GitHub, Jira, Notion, Google Drive, SharePoint, Azure Blob, Salesforce, SAP, HubSpot)
Monitoring stack (Prometheus, Grafana)
SAML authentication (OIDC is supported in lite — see Authentication)
Kubernetes / Helm chart deployment
Dedicated support

note

The full edition's ML services (full parser with PyTorch/Docling, reranker, sparse embedder, and the structure-analyzer[full] extra) are published as linux/amd64 images only — they depend on PyTorch wheels that aren't available for arm64. If you need to run the enterprise features, use an amd64 host. The Developer/lite edition runs on both amd64 and arm64.

Enabling Authentication

By default, the Developer Edition runs with AUTH_MODE=none — no login required, all users share a single dev identity. The compose file binds the Web UI and API to 127.0.0.1 by default; enable authentication before changing LITE_BIND_ADDR for a shared or network-accessible deployment.

To enable authentication, configure any OIDC-compatible identity provider (Keycloak, Auth0, Okta, Google, Azure AD, etc.):

Edit .env in your ZenSearch directory:

# Change auth mode
AUTH_MODE=oidc

# OIDC provider settings (example: Keycloak)
OIDC_ISSUER_URL=https://your-keycloak.example.com/realms/zensearch
OIDC_CLIENT_ID=zensearch-api
OIDC_CLIENT_SECRET=your-client-secret

Restart:

./start.sh --down && ./start.sh

The Web UI will now show a login screen. Users are automatically provisioned on first login with their OIDC identity (email, name, roles).

Provider examples:

Provider	Issuer URL
Keycloak	`https://keycloak.example.com/realms/your-realm`
Auth0	`https://your-tenant.auth0.com/`
Okta	`https://your-org.okta.com/oauth2/default`
Google	`https://accounts.google.com`
Azure AD	`https://login.microsoftonline.com/YOUR_TENANT_ID/v2.0`

caution

When configuring your OIDC provider, set the redirect URI to http://localhost:35173/auth/callback (or your custom domain — substitute WEB_PORT if you overrode it). The client must support the Authorization Code flow with PKCE.

Parser backends

ZenSearch ships with three parser backends. The right choice depends on the document mix you'll be ingesting and what hardware you can dedicate to parsing.

Backend	Image size	OCR	Image extraction	GPU	Best for
Lite (`PARSER_PARSER_BACKEND=lite`)	~200 MB	✗ Scanned PDFs are flagged but not OCR'd	✗	✗	Developer Edition, lightweight CPU-only deployments, evaluation, air-gapped without GPU
Local (`PARSER_PARSER_BACKEND=local`)	~5 GB	✓ via Docling	✓	Optional (NVIDIA)	Production self-hosted with reasonable hardware; full feature set on your own infra
Modal (`PARSER_PARSER_BACKEND=modal`)	n/a (serverless)	✓	✓	✓ (GPU)	Cloud deployments and hybrid setups where you don't want to operate the parsing infra

Document-type support across backends:

Format	Lite	Local	Modal
PDF (text-based)	✓ pymupdf4llm	✓ Docling	✓ Docling
PDF (scanned)	✗ flagged `needs_ocr`	✓ OCR	✓ OCR
DOCX	✓ python-docx	✓ Docling	✓ Docling
PPTX	✓ python-pptx	✓ Docling	✓ Docling
XLSX	✓ openpyxl	✓ Docling	✓ Docling
Plain text / Markdown / HTML	✓ shared fast path	✓ shared fast path	✓ shared fast path
Images / image-heavy PDFs	text only	✓ vision-described	✓ vision-described

Picking a backend

Just evaluating? Stay on Lite. The Developer Edition installer defaults to it.
Self-hosted production with sub-100k documents? Local is the typical choice. Add an NVIDIA GPU if you have a lot of scanned PDFs or image-heavy slide decks.
Hybrid or cloud-leaning deployment? Modal lets you keep the parsing burst capacity off your own hardware while keeping the rest of the stack on-prem.

To switch backend, set PARSER_PARSER_BACKEND in your .env and restart the parser container. No data migration required — the backend choice only affects future document parsing.

Vision model for image description

Image content search (see Multi-Modal Search) requires a vision-capable chat model on the back end. The default zen-mini mapping uses one in cloud. Self-hosters who pick a chat provider without vision (Groq) should set PARSER_IMAGE_DESCRIPTION_ENABLED=false to avoid noisy errors, or point the image describer at a vision-capable provider via the model gateway.

Architecture

ZenSearch deploys as a set of containerized services organized into three layers:

Application Layer — The core platform services that handle search, chat, agents, document processing, AI model routing, and the web interface
Infrastructure Layer — Databases, caching, object storage, and messaging used by the application services
Monitoring Layer (optional) — Metrics, dashboards, log aggregation, and alerting

Data Source Connectors

Connectors are deployed selectively — only enable the ones for data sources your organization uses. Each connector runs independently and can be added or removed without affecting the rest of the platform.

ZenSearch supports 13 connector types: S3, GitHub, Confluence, Jira, Slack, Notion, Google Drive, SharePoint, Azure Blob, Web Crawler, Salesforce, SAP, and HubSpot.

Prerequisites

Hardware Requirements

Minimum (small team, < 10,000 documents):

8 CPU cores
16 GB RAM
100 GB SSD storage
No GPU required (uses cloud LLM APIs)

Recommended (medium team, 10,000–100,000 documents):

16 CPU cores
32 GB RAM
500 GB SSD storage
Optional: NVIDIA GPU for local document parsing

Large scale (100,000+ documents):

32+ CPU cores
64+ GB RAM
1 TB+ SSD storage
NVIDIA GPU recommended for local parsing
Consider running infrastructure on dedicated nodes

Software Requirements

Docker 24.0+ with Docker Compose v2
OpenSSL (for generating encryption keys and TLS certificates)

For Kubernetes deployments:

Kubernetes 1.28+
Helm 3.x
kubectl configured for your cluster

Deployment Options

Docker Compose

Single-machine deployment for evaluation and small-to-medium teams. The platform starts with a single command and includes all application services, infrastructure, and your selected connectors.

Best for: Teams of up to ~50 users, evaluation, development, and staging environments.

Kubernetes

Production deployment with horizontal scaling, health checks, rolling updates, and high availability. Helm charts are provided.

Best for: Large teams, production workloads, organizations with existing Kubernetes infrastructure.

Air-Gapped Deployment

For environments with no internet access:

All container images can be pre-loaded from an internet-connected staging machine
Use a self-hosted LLM (Ollama, vLLM, or any OpenAI-compatible server) instead of cloud providers
Pre-download ML models for local inference
No internet access required after initial setup

Configuration

AI Models

ZenSearch supports multiple AI providers. You can configure which provider and models to use for chat, agents, and embeddings:

Cloud providers — OpenAI, Anthropic, Cohere, Groq
Self-hosted models — Ollama, vLLM, or any OpenAI-compatible API endpoint
Mix and match — Use cloud models for some tasks and local models for others

Embedding models can be configured separately from chat models to optimize cost and performance.

Authentication

ZenSearch integrates with your existing identity provider:

OIDC — Keycloak, Auth0, Okta, Azure AD, and other OIDC-compliant providers
SAML — Enterprise SSO
Clerk — Managed authentication service

Connectors

Deploy only the connectors your organization needs. Each connector is configured with credentials for the target data source and can be enabled, paused, or removed at any time through the dashboard.

Guardrails

Guardrails are configured per-team through the dashboard. Features include:

Prompt injection detection
PII detection and filtering
Hallucination detection (lexical, semantic, hybrid)
Toxicity filtering
Content moderation

See the Guardrails documentation for configuration details.

Observability — Distributed Tracing

ZenSearch services are instrumented with OpenTelemetry and can export traces to any OTLP-compatible backend (Grafana Tempo, Jaeger, Honeycomb, Datadog, etc.). Enabling tracing gives you end-to-end visibility into a request as it flows through core-api, the Model Gateway, agents, and downstream providers — including per-span timing, model calls, tool invocations, and errors.

Enable on the Core API and Model Gateway:

OTEL_ENABLED=true
OTEL_EXPORTER_ENDPOINT=tempo:4318   # or your OTLP HTTP collector
OTEL_EXPORTER_TYPE=otlp             # use "stdout" for local development

Both services auto-set their service.name attribute, so traces are pre-grouped in your backend. Use docker-compose.monitoring.prod.yml for a ready-made Prometheus + Grafana + Tempo stack.

Security

Encryption

All stored credentials (API keys, OAuth tokens) are encrypted at rest
TLS is supported for all inter-service communication in production
Database connections support SSL/TLS

Network Security

AI model routing is internal-only and never exposed externally
Use a reverse proxy (Nginx, Traefik, Caddy) to terminate TLS for the web UI and API
Configure CORS to only allow your production domain
Internal services communicate on an isolated network

Access Control

Role-based access control (Owner, Admin, Editor, Viewer)
Document-level permissions synced from source platforms
Search-time permission enforcement — users only see content they're authorized to access

GPU Support

The Developer Edition installer automatically detects NVIDIA GPUs and uses GPU-accelerated document parsing when available. This significantly speeds up document processing for large volumes.

Requirements

NVIDIA GPU with CUDA support and 4 GB+ VRAM (8 GB recommended)
NVIDIA drivers installed (nvidia-smi should work)
NVIDIA Container Toolkit configured for Docker

Installing NVIDIA Container Toolkit

If the installer shows "NVIDIA GPU found but Docker NVIDIA runtime not configured", install the toolkit:

Ubuntu / Debian:

# Add NVIDIA container toolkit repository
curl -fsSL https://nvidia.github.io/libc-nvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libc-nvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install and configure
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

RHEL / CentOS / Fedora:

curl -s -L https://nvidia.github.io/libc-nvidia-container/stable/rpm/nvidia-container-toolkit.repo \
  | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

sudo yum install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU Access

# Should show your GPU info
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi

Re-run the Installer

After installing the toolkit, re-run the installer. It will detect the GPU and automatically use the GPU-accelerated parser image:

./start.sh --down
rm .env
./start.sh

How It Works

When a compatible GPU is detected, the installer:

Writes PARSER_GPU_ENABLED=true to .env
Applies docker-compose.lite.gpu.yml overlay which switches to the parser-gpu image
Adds NVIDIA device reservation and increases parser memory to 4 GB
Enables a persistent model cache volume (models are downloaded on first run)

Upgrading

ZenSearch releases are delivered as updated container images. The upgrade process:

Pull the latest images
Database migrations run automatically on startup
Restart services
Verify health via the dashboard

Zero-downtime upgrades are supported on Kubernetes deployments.

Enterprise Getting Started

Enterprise customers receive:

License key for on-premise deployment
Private deployment guide with step-by-step instructions
Container registry access for all platform images
Dedicated support from our engineering team

Contact [email protected] to discuss your on-premise deployment requirements.

Overview​

Developer Edition Quickstart​

Prerequisites​

Install​

Recommended Ollama models (Local Setup)​

Custom zensearch-chat Ollama tag​

Chat performance flags​

Apple Silicon MLX​

Manual Install​

Management​

What's Included​

GPU Acceleration​

Configuration​

Not Included (Enterprise Features)​

Enabling Authentication​

Parser backends​

Picking a backend​

Vision model for image description​

Architecture​

Data Source Connectors​

Prerequisites​

Hardware Requirements​

Software Requirements​

Deployment Options​

Docker Compose​

Kubernetes​

Air-Gapped Deployment​

Configuration​

AI Models​

Authentication​

Connectors​

Guardrails​

Observability — Distributed Tracing​

Security​

Encryption​

Network Security​

Access Control​

GPU Support​

Requirements​

Installing NVIDIA Container Toolkit​

Verify GPU Access​

Re-run the Installer​

How It Works​

Upgrading​

Enterprise Getting Started​

Overview

Developer Edition Quickstart

Prerequisites

Install

Recommended Ollama models (Local Setup)

Custom `zensearch-chat` Ollama tag

Chat performance flags

Apple Silicon MLX

Manual Install

Management

What's Included

GPU Acceleration

Configuration

Not Included (Enterprise Features)

Enabling Authentication

Parser backends

Picking a backend

Vision model for image description

Architecture

Data Source Connectors

Prerequisites

Hardware Requirements

Software Requirements

Deployment Options

Docker Compose

Kubernetes

Air-Gapped Deployment

Configuration

AI Models

Authentication

Connectors

Guardrails

Observability — Distributed Tracing

Security

Encryption

Network Security

Access Control

GPU Support

Requirements

Installing NVIDIA Container Toolkit

Verify GPU Access

Re-run the Installer

How It Works

Upgrading

Enterprise Getting Started