Ingest Gmail, Google Drive, and Google Calendar data into CortexDB as a unified connector.

Google Workspace Connector

The Google Workspace connector is a compound connector that ingests data from three Google services into CortexDB:

  • Gmail -- Email threads as message episodes
  • Google Drive -- Documents, spreadsheets, and files as document episodes
  • Google Calendar -- Meetings and events as meeting episodes

Each sub-connector can be independently enabled or disabled. A single GoogleWorkspaceConnector class manages all three.

Setup

1. Create a Google Cloud Service Account

  1. Go to Google Cloud Console > IAM > Service Accounts
  2. Click Create Service Account
  3. Name it (e.g., cortexdb-workspace-connector)
  4. Click Create and Continue
  5. Click Done (no roles needed at project level)
  6. Click into the service account, go to Keys > Add Key > Create new key > JSON
  7. Save the JSON key file securely

2. Enable Domain-Wide Delegation

  1. In the service account details, click Show advanced settings
  2. Copy the Client ID (numeric)
  3. Go to Google Workspace Admin > Security > API Controls > Domain-wide Delegation
  4. Click Add new
  5. Paste the Client ID
  6. Add the following OAuth scopes:
    • https://www.googleapis.com/auth/gmail.readonly
    • https://www.googleapis.com/auth/drive.readonly
    • https://www.googleapis.com/auth/calendar.readonly
  7. Click Authorize

3. Enable Google APIs

In the Google Cloud Console, enable the following APIs for your project:

  • Gmail API
  • Google Drive API
  • Google Calendar API

4. Configure the Connector

# Required
CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY=/path/to/service-account.json
[email protected]

# Sub-connector toggles (all enabled by default)
CORTEX_GOOGLE_GMAIL_ENABLED=true
CORTEX_GOOGLE_DRIVE_ENABLED=true
CORTEX_GOOGLE_CALENDAR_ENABLED=true

# CortexDB target
CORTEX_GOOGLE_TENANT_ID=my-app
CORTEX_GOOGLE_NAMESPACE=google_workspace

5. Start the Connector

# As part of CortexDB
docker run -d \
  -v /path/to/service-account.json:/secrets/gcp.json \
  -e CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY=/secrets/gcp.json \
  -e [email protected] \
  -e CORTEX_GOOGLE_TENANT_ID=my-app \
  cortexdb/cortexdb:latest \
  --enable-connector google_workspace

# As a standalone process
cortexdb-connector google_workspace \
  --service-account-key /path/to/service-account.json \
  --delegated-user [email protected] \
  --tenant-id my-app

What Gets Ingested

Gmail

| Gmail Event | Episode Type | Content | |---|---|---| | Email message | message | Subject line and body snippet | | Email thread | message | Grouped by Gmail thread ID |

Google Drive

| Drive Event | Episode Type | Content | |---|---|---| | Google Doc | document | Document name and type | | Google Sheet | document | Spreadsheet name and type | | Google Slides | document | Presentation name and type | | Uploaded file | document | File name, type, and size | | Shared Drive file | document | File with shared drive metadata |

Google Calendar

| Calendar Event | Episode Type | Content | |---|---|---| | Meeting | meeting | Title, description, attendees | | All-day event | meeting | Title and date | | Recurring event | meeting | Grouped by recurring event ID | | Video call | meeting | Title with video call metadata |

Episode Metadata

Gmail episode:

{
  "type": "message",
  "content": "Subject: Q2 Planning Update\n\nHere are the latest numbers from the planning session...",
  "source": "google_workspace",
  "author": "[email protected]",
  "timestamp": "2026-03-15T09:00:00Z",
  "metadata": {
    "sub_source": "gmail",
    "subject": "Q2 Planning Update",
    "to": "[email protected]",
    "cc": "[email protected]",
    "labels": ["INBOX", "IMPORTANT"],
    "size_estimate": 4096
  }
}

Drive episode:

{
  "type": "document",
  "content": "Google Doc: Architecture Decision Record - Auth Service",
  "source": "google_workspace",
  "author": "[email protected]",
  "timestamp": "2026-03-14T16:20:00Z",
  "metadata": {
    "sub_source": "drive",
    "file_id": "1abc123...",
    "file_name": "Architecture Decision Record - Auth Service",
    "mime_type": "application/vnd.google-apps.document",
    "doc_type": "Google Doc",
    "shared": true,
    "web_view_link": "https://docs.google.com/document/d/1abc123/edit"
  }
}

Calendar episode:

{
  "type": "meeting",
  "content": "Weekly Engineering Standup\nDiscuss blockers and progress\nLocation: Conference Room A",
  "source": "google_workspace",
  "author": "[email protected]",
  "timestamp": "2026-03-15T10:00:00Z",
  "metadata": {
    "sub_source": "calendar",
    "event_id": "evt_abc123",
    "calendar_id": "primary",
    "start": "2026-03-15T10:00:00-07:00",
    "end": "2026-03-15T10:30:00-07:00",
    "status": "confirmed",
    "location": "Conference Room A",
    "attendee_count": 8,
    "has_video_call": true,
    "organizer_email": "[email protected]"
  }
}

Configuration

| Variable | Default | Description | |---|---|---| | CORTEX_GOOGLE_SERVICE_ACCOUNT_KEY | Required | Path to JSON key file or base64-encoded key | | CORTEX_GOOGLE_DELEGATED_USER | Required | Workspace user email to impersonate | | CORTEX_GOOGLE_GMAIL_ENABLED | true | Enable Gmail ingestion | | CORTEX_GOOGLE_DRIVE_ENABLED | true | Enable Drive ingestion | | CORTEX_GOOGLE_CALENDAR_ENABLED | true | Enable Calendar ingestion | | CORTEX_GOOGLE_LABEL_FILTER | INBOX | Comma-separated Gmail labels to filter | | CORTEX_GOOGLE_SHARED_DRIVES | (none) | Comma-separated shared Drive IDs | | CORTEX_GOOGLE_CALENDAR_IDS | primary | Comma-separated calendar IDs | | CORTEX_GOOGLE_TENANT_ID | Required | CortexDB target tenant | | CORTEX_GOOGLE_NAMESPACE | google_workspace | CortexDB target namespace | | CORTEX_GOOGLE_BACKFILL_DAYS | 30 | Days of history to backfill on first run |

Backfill

On first run, the connector backfills the configured number of days of history across all enabled sub-connectors. Gmail uses date-based query filters, Drive uses modifiedTime, and Calendar uses timeMin.

# Backfill 90 days across all services
cortexdb-connector google_workspace --backfill-days 90

# Backfill only Gmail and Calendar (disable Drive)
cortexdb-connector google_workspace \
  --backfill-days 90 \
  --drive-enabled false

Service Account Key Formats

The connector accepts the service account key in three formats:

  1. File path: /path/to/service-account.json
  2. Base64-encoded: Encode the JSON file with base64 service-account.json and pass the string
  3. Raw JSON string: Pass the JSON content directly (useful for environment variables in CI/CD)