Skip to main content

GitHub Connector

Connect to GitHub to index code and documentation files from repositories in your organization or personal account.

Overview

The GitHub connector allows you to:

  • Index repository file contents (code, Markdown, README, configuration, etc.)
  • Pick a single branch per repository to index
  • Filter by file extension and exclude paths
  • Use GitHub Enterprise via a custom API base URL

Prerequisites

  • GitHub account or organization membership
  • Repository access (read permissions)
  • Personal Access Token or OAuth app authorization

Authentication

  1. Click Connect with GitHub
  2. Authorize ZenSearch to access your repositories
  3. Select organization access if needed

Personal Access Token (PAT)

  1. Go to GitHub → Settings → Developer settings → Personal access tokens
  2. Generate a new token with repo scope (or public_repo for public only)
  3. Copy the token and enter it in ZenSearch

OAuth2 Access Token

If using OAuth2 app authorization, you can provide an access_token directly. This is the token obtained from the GitHub OAuth2 flow.

OAuth2 Refresh Token

For long-lived access, provide a refresh_token alongside the access token. ZenSearch will automatically refresh the access token when it expires.

Configuration

SettingDescription
Organization/UserGitHub org or username
RepositoriesSpecific repos or all
BranchBranch to index (default: main)
Path FilterLimit to specific paths
include_hiddenInclude hidden files starting with .
file_extensionsFilter by file extensions (e.g., [".md", ".txt", ".go"])
exclude_pathsExclude specific paths (e.g., ["vendor/", "node_modules/"])

Setup Steps

  1. Add Connector: Knowledge → Add Data Source → GitHub
  2. Authenticate: OAuth or enter PAT
  3. Select Repositories: Choose repos to index
  4. Configure Filters: Set branch and path filters
  5. Test & Create: Verify connection and save

Supported Content

Content TypeIndexed
Code filesYes
Markdown docsYes
README filesYes
IssuesNot supported
Pull requestsNot supported
Wiki pagesNot supported

The connector walks the repository tree at the selected branch and indexes file blobs. Issues, pull requests, discussions, releases, and wiki pages are not currently collected.

Path Filtering

Include specific paths:

/docs/*
/src/**/*.md
/README.md

Exclude paths:

/node_modules/*
/dist/*
/.git/*

Sync

The connector polls GitHub on a schedule and re-walks the configured branch for each connected repository.

For near-real-time updates, configure a GitHub repository webhook pointing at https://your-zensearch-host/webhooks/connectors/github with HMAC signature validation. Push events on the indexed branch trigger an incremental reindex of the connector.

Best Practices

  1. Filter out node_modules, vendor, and build directories
  2. Focus on documentation and source code
  3. Pick a stable branch for indexing (e.g., main / release)

Troubleshooting

Access denied: Verify PAT has repo scope or OAuth is authorized

Missing repos: Check organization membership and repo visibility

Truncated tree: For very large repos GitHub may return a truncated tree response — split the repo across multiple connectors with narrower path filters