Amazon S3 Connector
Connect to Amazon S3 or any S3-compatible storage (MinIO, DigitalOcean Spaces, Backblaze B2) to index documents stored in your buckets.
Overview
The S3 connector allows you to:
- Index documents from S3 buckets
- Filter by prefix (folder path)
- Support S3-compatible storage providers
- Sync on schedule or via S3 event notifications
Prerequisites
Before connecting, ensure you have:
- An S3 bucket with documents
- AWS credentials with read access
- (Optional) S3 event notifications configured for real-time sync
Authentication Methods
IAM User Access Keys
Create an IAM user with S3 read permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
IAM Role (Recommended for EC2/ECS)
If running ZenSearch on AWS, use IAM roles:
- Create an IAM role with S3 read permissions
- Attach the role to your EC2 instance or ECS task
- ZenSearch will use the instance credentials automatically
Configuration
Required Settings
| Setting | Description | Example |
|---|---|---|
| Bucket Name | S3 bucket to connect | my-documents-bucket |
| Region | AWS region | us-east-1 |
| Access Key ID | IAM user access key | AKIAIOSFODNN7EXAMPLE |
| Secret Access Key | IAM user secret key | wJalrXUtnFEMI/K7MDENG/... |
Optional Settings
| Setting | Description | Default |
|---|---|---|
| Prefix | Path prefix to filter | / (root) |
| Endpoint | Custom S3 endpoint | AWS default |
| Path Style | Use path-style URLs | false |
S3-Compatible Storage
For MinIO, DigitalOcean Spaces, or other S3-compatible services:
| Provider | Endpoint Example |
|---|---|
| MinIO | http://minio.local:9000 |
| DigitalOcean | https://nyc3.digitaloceanspaces.com |
| Backblaze B2 | https://s3.us-west-000.backblazeb2.com |
Enable Path Style for MinIO and some providers.
Setup Steps
1. Navigate to Data Sources
Go to Knowledge → Add Data Source → Amazon S3
2. Enter Credentials
Bucket Name: my-documents-bucket
Region: us-east-1
Access Key ID: AKIAIOSFODNN7EXAMPLE
Secret Access Key: ••••••••••••••••
3. Configure Prefix (Optional)
To sync only specific folders:
Prefix: /documents/public/
This syncs only files under s3://my-bucket/documents/public/
4. Select Collection
Choose an existing collection or create a new one.
5. Test Connection
Click Test Connection to verify:
- Credentials are valid
- Bucket exists and is accessible
- Prefix path exists (if specified)
6. Create Connector
Click Create to save and start the initial sync.
Supported File Types
The S3 connector processes:
| Type | Extensions |
|---|---|
| Documents | .pdf, .docx, .doc, .txt, .rtf |
| Spreadsheets | .xlsx, .xls, .csv |
| Presentations | .pptx, .ppt |
| Markdown | .md, .markdown |
| Code | .py, .js, .ts, .go, .java, etc. |
| Images | .png, .jpg, .jpeg (with OCR) |
Webhook Setup (Real-time Sync)
For real-time updates, configure S3 event notifications:
1. Create SNS Topic or SQS Queue
Set up a destination for S3 events.
2. Configure Bucket Notifications
In AWS Console or via CLI:
aws s3api put-bucket-notification-configuration \
--bucket my-bucket \
--notification-configuration '{
"TopicConfigurations": [{
"TopicArn": "arn:aws:sns:us-east-1:123456789:s3-events",
"Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]
}]
}'
3. Configure ZenSearch Webhook
Contact support to configure the webhook endpoint.
Filtering Content
By Prefix
Prefix: /reports/2024/
Syncs: s3://bucket/reports/2024/q1.pdf, s3://bucket/reports/2024/q2.pdf
Skips: s3://bucket/reports/2023/q4.pdf
By File Type (in ZenSearch)
Configure in connector settings to include/exclude specific extensions.
Permissions
Minimum Required Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}
For KMS-Encrypted Buckets
Add KMS permissions:
{
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": "arn:aws:kms:region:account:key/key-id"
}
Best Practices
Security
- Use IAM roles instead of access keys when possible
- Apply least-privilege permissions
- Enable bucket versioning for recovery
- Use server-side encryption
Performance
- Use specific prefixes to limit scope
- Enable incremental sync for large buckets
- Organize files in logical folder structures
- Remove old/archived content from sync scope
Cost Optimization
- Sync only necessary content
- Use infrequent access storage for archives
- Monitor API request costs
- Set appropriate sync schedules
Troubleshooting
Access Denied
- Verify IAM permissions include
s3:GetObjectands3:ListBucket - Check bucket policy allows access
- Verify region is correct
- Check for bucket encryption requiring KMS permissions
Bucket Not Found
- Verify bucket name is correct (case-sensitive)
- Check bucket exists in specified region
- Ensure no typos in configuration
Slow Sync
- Large buckets take longer for initial sync
- Consider using prefix filters
- Check network connectivity to AWS
- Review rate limiting settings
Missing Files
- Check prefix filter settings
- Verify file types are supported
- Ensure files aren't in Glacier storage class
- Check file permissions within bucket
Example Configurations
Basic Setup
Name: Company Documents S3
Bucket: company-docs
Region: us-east-1
Access Key: AKIA...
Secret Key: ****
Prefix: /
Collection: Company Knowledge Base
Filtered by Department
Name: Engineering Docs
Bucket: company-docs
Region: us-east-1
Prefix: /engineering/
Collection: Engineering
MinIO Local Storage
Name: Local MinIO Storage
Bucket: documents
Region: us-east-1
Endpoint: http://minio.local:9000
Path Style: true
Access Key: minioadmin
Secret Key: ****
Collection: Local Documents