Ingest Gmail, Google Drive, and Google Calendar data into CortexDB as a unified connector.
Google Workspace Connector
The Google Workspace connector is a compound connector that ingests data from three Google services into CortexDB:
- Gmail -- Email threads as
messageepisodes - Google Drive -- Documents, spreadsheets, and files as
documentepisodes - Google Calendar -- Meetings and events as
meetingepisodes
Each sub-connector can be independently enabled or disabled. A single GoogleWorkspaceConnector class manages all three.
Setup
1. Create a Google Cloud Service Account
- Go to Google Cloud Console > IAM > Service Accounts
- Click Create Service Account
- Name it (e.g.,
cortexdb-workspace-connector) - Click Create and Continue
- Click Done (no roles needed at project level)
- Click into the service account, go to Keys > Add Key > Create new key > JSON
- Save the JSON key file securely
2. Enable Domain-Wide Delegation
- In the service account details, click Show advanced settings
- Copy the Client ID (numeric)
- Go to Google Workspace Admin > Security > API Controls > Domain-wide Delegation
- Click Add new
- Paste the Client ID
- Add the following OAuth scopes:
https://www.googleapis.com/auth/gmail.readonlyhttps://www.googleapis.com/auth/drive.readonlyhttps://www.googleapis.com/auth/calendar.readonly
- Click Authorize
3. Enable Google APIs
In the Google Cloud Console, enable the following APIs for your project:
- Gmail API
- Google Drive API
- Google Calendar API
4. Configure the Connector
# Required
CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY=/path/to/service-account.json
[email protected]
# Sub-connector toggles (all enabled by default)
CORTEX_GOOGLE_GMAIL_ENABLED=true
CORTEX_GOOGLE_DRIVE_ENABLED=true
CORTEX_GOOGLE_CALENDAR_ENABLED=true
# CortexDB target
CORTEX_GOOGLE_TENANT_ID=my-app
CORTEX_GOOGLE_NAMESPACE=google_workspace
5. Start the Connector
# As part of CortexDB
docker run -d \
-v /path/to/service-account.json:/secrets/gcp.json \
-e CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY=/secrets/gcp.json \
-e [email protected] \
-e CORTEX_GOOGLE_TENANT_ID=my-app \
cortexdb/cortexdb:latest \
--enable-connector google_workspace
# As a standalone process
cortexdb-connector google_workspace \
--service-account-key /path/to/service-account.json \
--delegated-user [email protected] \
--tenant-id my-app
What Gets Ingested
Gmail
| Gmail Event | Episode Type | Content |
|---|---|---|
| Email message | message | Subject line and body snippet |
| Email thread | message | Grouped by Gmail thread ID |
Google Drive
| Drive Event | Episode Type | Content |
|---|---|---|
| Google Doc | document | Document name and type |
| Google Sheet | document | Spreadsheet name and type |
| Google Slides | document | Presentation name and type |
| Uploaded file | document | File name, type, and size |
| Shared Drive file | document | File with shared drive metadata |
Google Calendar
| Calendar Event | Episode Type | Content |
|---|---|---|
| Meeting | meeting | Title, description, attendees |
| All-day event | meeting | Title and date |
| Recurring event | meeting | Grouped by recurring event ID |
| Video call | meeting | Title with video call metadata |
Episode Metadata
Gmail episode:
{
"type": "message",
"content": "Subject: Q2 Planning Update\n\nHere are the latest numbers from the planning session...",
"source": "google_workspace",
"author": "[email protected]",
"timestamp": "2026-03-15T09:00:00Z",
"metadata": {
"sub_source": "gmail",
"subject": "Q2 Planning Update",
"to": "[email protected]",
"cc": "[email protected]",
"labels": ["INBOX", "IMPORTANT"],
"size_estimate": 4096
}
}
Drive episode:
{
"type": "document",
"content": "Google Doc: Architecture Decision Record - Auth Service",
"source": "google_workspace",
"author": "[email protected]",
"timestamp": "2026-03-14T16:20:00Z",
"metadata": {
"sub_source": "drive",
"file_id": "1abc123...",
"file_name": "Architecture Decision Record - Auth Service",
"mime_type": "application/vnd.google-apps.document",
"doc_type": "Google Doc",
"shared": true,
"web_view_link": "https://docs.google.com/document/d/1abc123/edit"
}
}
Calendar episode:
{
"type": "meeting",
"content": "Weekly Engineering Standup\nDiscuss blockers and progress\nLocation: Conference Room A",
"source": "google_workspace",
"author": "[email protected]",
"timestamp": "2026-03-15T10:00:00Z",
"metadata": {
"sub_source": "calendar",
"event_id": "evt_abc123",
"calendar_id": "primary",
"start": "2026-03-15T10:00:00-07:00",
"end": "2026-03-15T10:30:00-07:00",
"status": "confirmed",
"location": "Conference Room A",
"attendee_count": 8,
"has_video_call": true,
"organizer_email": "[email protected]"
}
}
Configuration
| Variable | Default | Description |
|---|---|---|
| CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY | Required | Path to JSON key file or base64-encoded key |
| CORTEX_GOOGLE_DELEGATED_USER | Required | Workspace user email to impersonate |
| CORTEX_GOOGLE_GMAIL_ENABLED | true | Enable Gmail ingestion |
| CORTEX_GOOGLE_DRIVE_ENABLED | true | Enable Drive ingestion |
| CORTEX_GOOGLE_CALENDAR_ENABLED | true | Enable Calendar ingestion |
| CORTEX_GOOGLE_LABEL_FILTER | INBOX | Comma-separated Gmail labels to filter |
| CORTEX_GOOGLE_SHARED_DRIVES | (none) | Comma-separated shared Drive IDs |
| CORTEX_GOOGLE_CALENDAR_IDS | primary | Comma-separated calendar IDs |
| CORTEX_GOOGLE_TENANT_ID | Required | CortexDB target tenant |
| CORTEX_GOOGLE_NAMESPACE | google_workspace | CortexDB target namespace |
| CORTEX_GOOGLE_BACKFILL_DAYS | 30 | Days of history to backfill on first run |
Backfill
On first run, the connector backfills the configured number of days of history across all enabled sub-connectors. Gmail uses date-based query filters, Drive uses modifiedTime, and Calendar uses timeMin.
# Backfill 90 days across all services
cortexdb-connector google_workspace --backfill-days 90
# Backfill only Gmail and Calendar (disable Drive)
cortexdb-connector google_workspace \
--backfill-days 90 \
--drive-enabled false
Service Account Key Formats
The connector accepts the service account key in three formats:
- File path:
/path/to/service-account.json - Base64-encoded: Encode the JSON file with
base64 service-account.jsonand pass the string - Raw JSON string: Pass the JSON content directly (useful for environment variables in CI/CD)