Automatically ingest Discord messages, threads, and forum posts into CortexDB as episodes.

Discord Connector

The Discord connector syncs messages, thread conversations, and forum posts from your Discord servers into CortexDB. Channel messages are stored as message episodes; forum posts become document episodes. All content includes author, channel, guild, and reaction metadata.

Setup

1. Create a Discord Bot

  1. Go to discord.com/developers/applications
  2. Click New Application and name it (e.g. CortexDB Sync)
  3. Go to the Bot section and click Add Bot
  4. Under Privileged Gateway Intents, enable:
    • Message Content Intent -- required to read message text
  5. Copy the Bot Token
  6. Go to OAuth2 > URL Generator:
    • Scopes: bot
    • Bot Permissions: Read Messages/View Channels, Read Message History
  7. Use the generated URL to invite the bot to your server(s)

2. Configure the Connector

# Required
CORTEX_DISCORD_BOT_TOKEN=your_bot_token_here

# Optional: scope to specific guilds and channels
CORTEX_DISCORD_GUILD_IDS=guild-id-1,guild-id-2
CORTEX_DISCORD_CHANNEL_IDS=channel-id-1,channel-id-2

# CortexDB target
CORTEX_DISCORD_TENANT_ID=my-app
CORTEX_DISCORD_NAMESPACE=discord

3. Start the Connector

# As part of CortexDB
docker run -d \
  -e CORTEX_DISCORD_BOT_TOKEN=... \
  -e CORTEX_DISCORD_GUILD_IDS="guild-id-1" \
  -e CORTEX_DISCORD_TENANT_ID=my-app \
  cortexdb/cortexdb:latest \
  --enable-connector discord

# As a standalone process
cortexdb-connector discord \
  --bot-token ... \
  --guild-ids "guild-id-1" \
  --tenant-id my-app

What Gets Ingested

| Discord Event | Episode Type | Content | |---|---|---| | Channel message | message | Message text + attachment/embed descriptions | | Thread message | message | Message text (with thread metadata) | | Forum post | document | Thread starter message + forum post title | | Bot message | Skipped | Bot messages are excluded by default |

Episode Metadata

Each ingested message includes:

{
  "type": "message",
  "content": "Has anyone looked into the new rate-limit changes for v10?",
  "source": "discord",
  "author": "alice#1234",
  "timestamp": "2026-03-15T16:30:00Z",
  "metadata": {
    "message_id": "1234567890",
    "channel_id": "9876543210",
    "channel_name": "api-discussion",
    "guild_id": "1111111111",
    "guild_name": "Engineering",
    "attachment_count": 0,
    "embed_count": 0,
    "reactions": [
      { "emoji": "thumbsup", "count": 3 }
    ],
    "mention_count": 1,
    "pinned": false
  }
}

Configuration

| Variable | Default | Description | |---|---|---| | CORTEX_DISCORD_BOT_TOKEN | Required | Discord bot token | | CORTEX_DISCORD_GUILD_IDS | None | Comma-separated guild (server) IDs | | CORTEX_DISCORD_CHANNEL_IDS | All text channels | Comma-separated channel IDs | | CORTEX_DISCORD_TENANT_ID | Required | Target tenant | | CORTEX_DISCORD_NAMESPACE | discord | Target namespace | | CORTEX_DISCORD_BACKFILL_DAYS | 30 | Days of history to backfill on first run | | CORTEX_DISCORD_INCLUDE_BOTS | false | Include messages from bot accounts |

Backfill

On first run, the connector backfills the configured number of days of history. It converts the cutoff timestamp to a Discord snowflake ID for efficient pagination, so only messages within the backfill window are fetched. Subsequent runs process only new messages since the last sync.

# Backfill 90 days of history
cortexdb-connector discord --backfill-days 90

The connector uses Discord's before parameter with message-ID pagination. Each message's snowflake ID encodes its timestamp, enabling precise cutoff without scanning the entire channel history. Forum threads (both active and archived) are discovered and their messages ingested with thread metadata.