Skip to main content

Amazon S3 Connector

Connect to Amazon S3 or any S3-compatible storage (MinIO, DigitalOcean Spaces, Backblaze B2) to index documents stored in your buckets.

Overview

The S3 connector allows you to:

  • Index documents from S3 buckets
  • Filter by prefix (folder path)
  • Support S3-compatible storage providers
  • Sync on schedule or via S3 event notifications

Prerequisites

Before connecting, ensure you have:

  • An S3 bucket with documents
  • AWS credentials with read access
  • (Optional) S3 event notifications configured for real-time sync

Authentication Methods

IAM User Access Keys

Create an IAM user with S3 read permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}

If running ZenSearch on AWS, use IAM roles:

  1. Create an IAM role with S3 read permissions
  2. Attach the role to your EC2 instance or ECS task
  3. ZenSearch will use the instance credentials automatically

Configuration

Required Settings

SettingDescriptionExample
Bucket NameS3 bucket to connectmy-documents-bucket
RegionAWS regionus-east-1
Access Key IDIAM user access keyAKIAIOSFODNN7EXAMPLE
Secret Access KeyIAM user secret keywJalrXUtnFEMI/K7MDENG/...

Optional Settings

SettingDescriptionDefault
PrefixPath prefix to filter/ (root)
EndpointCustom S3 endpointAWS default
Path StyleUse path-style URLsfalse

S3-Compatible Storage

For MinIO, DigitalOcean Spaces, or other S3-compatible services:

ProviderEndpoint Example
MinIOhttp://minio.local:9000
DigitalOceanhttps://nyc3.digitaloceanspaces.com
Backblaze B2https://s3.us-west-000.backblazeb2.com

Enable Path Style for MinIO and some providers.

Setup Steps

1. Navigate to Data Sources

Go to KnowledgeAdd Data SourceAmazon S3

2. Enter Credentials

Bucket Name: my-documents-bucket
Region: us-east-1
Access Key ID: AKIAIOSFODNN7EXAMPLE
Secret Access Key: ••••••••••••••••

3. Configure Prefix (Optional)

To sync only specific folders:

Prefix: /documents/public/

This syncs only files under s3://my-bucket/documents/public/

4. Select Collection

Choose an existing collection or create a new one.

5. Test Connection

Click Test Connection to verify:

  • Credentials are valid
  • Bucket exists and is accessible
  • Prefix path exists (if specified)

6. Create Connector

Click Create to save and start the initial sync.

Supported File Types

The S3 connector processes:

TypeExtensions
Documents.pdf, .docx, .doc, .txt, .rtf
Spreadsheets.xlsx, .xls, .csv
Presentations.pptx, .ppt
Markdown.md, .markdown
Code.py, .js, .ts, .go, .java, etc.
Images.png, .jpg, .jpeg (with OCR)

Webhook Setup (Real-time Sync)

For real-time updates, configure S3 event notifications:

1. Create SNS Topic or SQS Queue

Set up a destination for S3 events.

2. Configure Bucket Notifications

In AWS Console or via CLI:

aws s3api put-bucket-notification-configuration \
--bucket my-bucket \
--notification-configuration '{
"TopicConfigurations": [{
"TopicArn": "arn:aws:sns:us-east-1:123456789:s3-events",
"Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]
}]
}'

3. Configure ZenSearch Webhook

Contact support to configure the webhook endpoint.

Filtering Content

By Prefix

Prefix: /reports/2024/

Syncs: s3://bucket/reports/2024/q1.pdf, s3://bucket/reports/2024/q2.pdf Skips: s3://bucket/reports/2023/q4.pdf

By File Type (in ZenSearch)

Configure in connector settings to include/exclude specific extensions.

Permissions

Minimum Required Permissions

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}

For KMS-Encrypted Buckets

Add KMS permissions:

{
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": "arn:aws:kms:region:account:key/key-id"
}

Best Practices

Security

  1. Use IAM roles instead of access keys when possible
  2. Apply least-privilege permissions
  3. Enable bucket versioning for recovery
  4. Use server-side encryption

Performance

  1. Use specific prefixes to limit scope
  2. Enable incremental sync for large buckets
  3. Organize files in logical folder structures
  4. Remove old/archived content from sync scope

Cost Optimization

  1. Sync only necessary content
  2. Use infrequent access storage for archives
  3. Monitor API request costs
  4. Set appropriate sync schedules

Troubleshooting

Access Denied

  1. Verify IAM permissions include s3:GetObject and s3:ListBucket
  2. Check bucket policy allows access
  3. Verify region is correct
  4. Check for bucket encryption requiring KMS permissions

Bucket Not Found

  1. Verify bucket name is correct (case-sensitive)
  2. Check bucket exists in specified region
  3. Ensure no typos in configuration

Slow Sync

  1. Large buckets take longer for initial sync
  2. Consider using prefix filters
  3. Check network connectivity to AWS
  4. Review rate limiting settings

Missing Files

  1. Check prefix filter settings
  2. Verify file types are supported
  3. Ensure files aren't in Glacier storage class
  4. Check file permissions within bucket

Example Configurations

Basic Setup

Name: Company Documents S3
Bucket: company-docs
Region: us-east-1
Access Key: AKIA...
Secret Key: ****
Prefix: /
Collection: Company Knowledge Base

Filtered by Department

Name: Engineering Docs
Bucket: company-docs
Region: us-east-1
Prefix: /engineering/
Collection: Engineering

MinIO Local Storage

Name: Local MinIO Storage
Bucket: documents
Region: us-east-1
Endpoint: http://minio.local:9000
Path Style: true
Access Key: minioadmin
Secret Key: ****
Collection: Local Documents