Skip to main content

Azure Blob Storage Connector

Connect to Azure Blob Storage to index documents from your containers. ZenSearch parses files, extracts text, and makes them searchable through AI-powered search.

Overview

The Azure Blob connector allows you to:

  • Index documents from blob containers (PDF, DOCX, XLSX, TXT, code files, and more)
  • Filter by prefix/path for targeted indexing
  • Support multiple authentication methods (connection string, SAS, account key, managed identity)
  • Include blob metadata as searchable document properties
  • Process images with OCR for text extraction

Prerequisites

  • Azure Storage account
  • A container with documents to index
  • Access credentials (connection string, account key, SAS token, or managed identity)

Authentication

  1. Go to Azure Portal → Storage Account → Access keys
  2. Copy the Connection string (either key1 or key2)
  3. Enter it in ZenSearch

Account Key

  1. Go to Azure Portal → Storage Account → Access keys
  2. Copy the Account Key (key1 or key2)
  3. Enter the account name and key in ZenSearch

SAS Token

Shared Access Signature tokens provide time-limited, scoped access:

  1. Go to Azure Portal → Storage Account → Shared access signature
  2. Configure permissions:
    • Allowed services: Blob
    • Allowed resource types: Container, Object
    • Allowed permissions: Read, List
  3. Set an appropriate expiry date
  4. Generate and copy the SAS token

For ZenSearch deployments running on Azure (VMs, App Service, AKS), managed identity provides passwordless authentication:

  1. Enable Managed Identity on your Azure resource (VM, App Service, Container Instance, etc.)
  2. Assign the Storage Blob Data Reader role to the identity on the target storage account:
    az role assignment create \
    --assignee <managed-identity-object-id> \
    --role "Storage Blob Data Reader" \
    --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>
  3. Set auth_method to managed_identity in ZenSearch — no credentials required

Configuration Reference

SettingTypeRequiredDescription
Storage AccountstringYesAzure Storage account name
Container NamestringYesBlob container name
Auth MethodstringYesAuthentication method: connection_string, account_key, sas_token, or managed_identity
Connection StringstringNo*Full connection string (*required for connection_string auth)
Account KeystringNo*Account key (*required for account_key auth)
SAS TokenstringNo*SAS token (*required for sas_token auth)
PrefixstringNoPath prefix filter (e.g., documents/2025/)
Endpoint SuffixstringNoCustom endpoint suffix (default: blob.core.windows.net). Use core.chinacloudapi.cn for Azure China.
Include ImagesbooleanNoProcess images with OCR for text extraction
Include MetadatabooleanNoInclude blob metadata as document properties

Setup Steps

  1. Add Connector: Navigate to Knowledge → Add Data Source → Azure Blob Storage
  2. Enter Account Details: Storage account name and container name
  3. Select Auth Method: Choose your authentication approach
  4. Provide Credentials: Enter the required credentials for your auth method
  5. Set Prefix (optional): Filter to a specific path within the container
  6. Test & Create: Verify the connection and save

Supported File Types

The Azure Blob connector processes the same file types as the S3 connector:

CategoryFormats
DocumentsPDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, ODT, ODS, ODP
TextTXT, MD, RST, CSV, TSV, LOG
CodeAll major languages (Python, Go, JavaScript, TypeScript, Java, etc.)
MarkupHTML, XML, JSON, YAML, TOML
ImagesPNG, JPG, TIFF (when OCR is enabled via include_images)

Maximum file size: 500 MB per file.

Best Practices

  1. Use SAS tokens with minimal permissions — Grant only Read and List permissions, scoped to the specific container
  2. Set reasonable expiry dates — SAS tokens should expire and be rotated regularly
  3. Use managed identity in production — Eliminates credential management entirely for Azure-hosted deployments
  4. Filter by prefix for large containers — If your container has thousands of blobs, use prefix to target specific directories
  5. Enable metadata — Blob metadata (custom key-value pairs set on blobs) can provide additional context for search results
  6. Organize blobs by topic — Create separate ZenSearch collections for different blob prefixes (e.g., legal/, engineering/, hr/)

Sovereign Cloud Support

For Azure sovereign clouds, set the endpoint_suffix to the appropriate value:

CloudEndpoint Suffix
Azure Globalblob.core.windows.net (default)
Azure Chinablob.core.chinacloudapi.cn
Azure Governmentblob.core.usgovcloudapi.net
Azure Germanyblob.core.cloudapi.de

Troubleshooting

Authentication failed

  • Connection string: Verify the full connection string is copied correctly (no trailing whitespace)
  • Account key: Ensure both the account name and key are provided
  • SAS token: Check the token has not expired and includes Read and List permissions
  • Managed identity: Verify the role assignment is on the correct storage account and the identity has propagated (may take a few minutes)

Container not found

  • Verify the container name is spelled correctly (case-sensitive)
  • Check that the container exists in the specified storage account
  • Ensure the credentials have access to the container

Files not indexed

  • Check that the file types are supported (see table above)
  • Verify the prefix filter is not too restrictive
  • Files larger than 500 MB are skipped
  • Ensure the blobs are not in an archive access tier (Hot or Cool tiers are required)

Slow sync performance

  • Large containers with many small files may take time; use prefix to scope the sync
  • Network latency between ZenSearch and Azure may affect throughput
  • Consider using a storage account in the same region as your ZenSearch deployment