GitHub Connector
Connect to GitHub to index code and documentation files from repositories in your organization or personal account.
Overview
The GitHub connector allows you to:
- Index repository file contents (code, Markdown, README, configuration, etc.)
- Pick a single branch per repository to index
- Filter by file extension and exclude paths
- Use GitHub Enterprise via a custom API base URL
Prerequisites
- GitHub account or organization membership
- Repository access (read permissions)
- Personal Access Token or OAuth app authorization
Authentication
OAuth 2.0 (Recommended)
- Click Connect with GitHub
- Authorize ZenSearch to access your repositories
- Select organization access if needed
Personal Access Token (PAT)
- Go to GitHub → Settings → Developer settings → Personal access tokens
- Generate a new token with
reposcope (orpublic_repofor public only) - Copy the token and enter it in ZenSearch
OAuth2 Access Token
If using OAuth2 app authorization, you can provide an access_token directly. This is the token obtained from the GitHub OAuth2 flow.
OAuth2 Refresh Token
For long-lived access, provide a refresh_token alongside the access token. ZenSearch will automatically refresh the access token when it expires.
Configuration
| Setting | Description |
|---|---|
| Organization/User | GitHub org or username |
| Repositories | Specific repos or all |
| Branch | Branch to index (default: main) |
| Path Filter | Limit to specific paths |
| include_hidden | Include hidden files starting with . |
| file_extensions | Filter by file extensions (e.g., [".md", ".txt", ".go"]) |
| exclude_paths | Exclude specific paths (e.g., ["vendor/", "node_modules/"]) |
Setup Steps
- Add Connector: Knowledge → Add Data Source → GitHub
- Authenticate: OAuth or enter PAT
- Select Repositories: Choose repos to index
- Configure Filters: Set branch and path filters
- Test & Create: Verify connection and save
Supported Content
| Content Type | Indexed |
|---|---|
| Code files | Yes |
| Markdown docs | Yes |
| README files | Yes |
| Issues | Not supported |
| Pull requests | Not supported |
| Wiki pages | Not supported |
The connector walks the repository tree at the selected branch and indexes file blobs. Issues, pull requests, discussions, releases, and wiki pages are not currently collected.
Path Filtering
Include specific paths:
/docs/*
/src/**/*.md
/README.md
Exclude paths:
/node_modules/*
/dist/*
/.git/*
Sync
The connector polls GitHub on a schedule and re-walks the configured branch for each connected repository.
For near-real-time updates, configure a GitHub repository webhook pointing at https://your-zensearch-host/webhooks/connectors/github with HMAC signature validation. Push events on the indexed branch trigger an incremental reindex of the connector.
Best Practices
- Filter out
node_modules,vendor, and build directories - Focus on documentation and source code
- Pick a stable branch for indexing (e.g.,
main/release)
Troubleshooting
Access denied: Verify PAT has repo scope or OAuth is authorized
Missing repos: Check organization membership and repo visibility
Truncated tree: For very large repos GitHub may return a truncated tree response — split the repo across multiple connectors with narrower path filters