Blog/Run CortexDB on Your Own Hardware
AnnouncementMay 18, 20269 min read

Run CortexDB on Your Own Hardware in Under Five Minutes

v0.5.1 ships CortexDB as a Docker image and stripped Linux binaries — the same engine that powers CortexDB Cloud, now runnable inside your own VPC, your homelab, or your laptop.

TL;DR. As of v0.5.1, CortexDB ships as a Docker image and stripped Linux binaries, both free for any use — same engine that powers CortexDB Cloud, now runnable inside your own VPC, your homelab, or your laptop. One command, no signup, no telemetry. Grab it at cortexdb.ai/download.

Why on-prem?

Most teams we talk to have one of three reasons to self-host their AI memory layer:

  1. Data residency / regulated industry. Customer data can't leave your network, full stop. Memory of customer conversations doesn't get an exception.
  2. Latency. Agent memory is in the hot path — every tool call, every recall, every belief update. Round-tripping to a managed service ~50ms each way is fine for a chatbot, painful for a coding agent that does 200 recalls per session.
  3. Cost predictability. Once your agents are running 24/7, the per-token cost of a managed memory layer becomes the kind of bill that gets put on a Finance ticket.

Cloud is still the right answer for prototyping and for teams who don't want to operate infrastructure. But “must use cloud” was never a forcing function we wanted to keep. So as of v0.5.1, you don't have to.

What you're getting

Two distribution paths, same binary underneath:

  • Docker image. Multi-arch (linux/amd64, linux/arm64) on Docker Hub: cortexdb/cortexdb:latest. Pinned tags for production (cortexdb/cortexdb:v0.5.1).
  • Linux tarballs. Stripped binary (~80 MB) + bundled assets, docs, and license. x86_64 and aarch64 builds. SHA-256 checksums on every GitHub Release.

Both packages include:

  • The full CortexDB engine (event sourcing, recall pipeline, all five memory layers)
  • Built-in dashboard at localhost:3141
  • WordNet synonym map for BM25 query expansion (preloaded)
  • docs/GETTING_STARTED.md — five-minute quickstart bundled in
  • Full v1 API reference + production deployment guide

macOS and Windows server builds aren't shipped yet — Linux is the production target. Reach out via contact if you have a specific need.

The fastest path: Docker

If you have Docker installed, this is the whole installation:

docker run -d \
  --name cortexdb \
  -p 3141:3141 \
  -v cortexdb-data:/data \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  cortexdb/cortexdb:latest

Two notes on the env vars: CortexDB calls OpenAI (gpt-4o-mini) for entity extraction and fact enrichment, and Anthropic (Claude) for the /v1/answer endpoint. You can swap these for any OpenAI-compatible endpoint by setting CORTEX_LLM_URL. If you only want to use the lower-level /v1/experience and /v1/recall endpoints, the answer key is optional.

Verify it's up:

curl -s http://localhost:3141/v1/admin/health
# {"status":"ok","version":"0.5.1",...}

Then open http://localhost:3141 in a browser for the dashboard.

The tarball path

For environments without Docker (airgapped, bare metal, banks that have opinions about container runtimes):

# x86_64
curl -LO https://cortexdb.ai/download/cortexdb-0.5.1-linux-amd64.tar.gz
tar -xzf cortexdb-0.5.1-linux-amd64.tar.gz
cd cortexdb-0.5.1-linux-amd64
./cortexdb

For systemd, an example unit ships in docs/PRODUCTION_DEPLOYMENT.md. Drop it under /etc/systemd/system/cortexdb.service, set the env file path, and systemctl enable --now cortexdb.

Your first memory

CortexDB's write surface is one endpoint: /v1/experience. You hand it a scope (a namespace), some content, an observed-at timestamp, and an idempotency key. The server takes care of chunking, embedding, fact extraction, entity resolution, and layer materialization.

curl -X POST http://localhost:3141/v1/experience \
  -H 'Content-Type: application/json' \
  -d '{
    "scope": "ws:demo",
    "modality": "observation",
    "content": { "kind": "text", "text": "Priya at Acme signed for 200 seats." },
    "context": { "observed_at": "'`date -u +%Y-%m-%dT%H:%M:%SZ`'" },
    "idempotency_key": "'`uuidgen`'"
  }'

Then read it back:

curl -X POST http://localhost:3141/v1/recall \
  -H 'Content-Type: application/json' \
  -d '{
    "scope": "ws:demo",
    "query": "Who at Acme is the contact?",
    "view": "local"
  }'

That's the whole demo. Five layers (Events → Episodes → Facts → Beliefs → Understanding) materialize behind the scenes; you just ask questions and get answers with provenance.

SDKs

If you'd rather not curl:

pip install cortexdbai
# or
npm i cortexdbai

Both packages ship the same CortexDB client. Point at http://localhost:3141 and you're done.

What ships in the package

cortexdb              # stripped release binary (~80 MB)
assets/               # WordNet synonym map for BM25
docs/
  GETTING_STARTED.md
  API_DESIGN_V1.md
  API_REFERENCE_V1.md
  CORTEXDB_FOR_THE_ENTERPRISE.md
  PRODUCTION_DEPLOYMENT.md
  cortexdb_customer_deck.html
README.md
LICENSE.txt

Everything you need to take it from “downloaded” to “running in production behind a reverse proxy with TLS and a backup policy.” The production guide covers systemd hardening, secrets handling, snapshot strategy, and the operational metrics worth alerting on.

A note on the architecture

For folks curious about what's actually inside the binary:

  • RocksDB for the write-ahead log and content store. The WAL is the source of truth — every index is a materialized view.
  • Tantivy for BM25 full-text. WordNet expansion preloaded.
  • HNSW for vector search. We do the embeddings out-of-process (OpenAI by default, swap with any compatible endpoint).
  • Custom graph index for entity relationships and provenance walks.
  • In-process derivation pipeline materializes episodes, beliefs, and concepts asynchronously. The /v1/recall and /v1/answer paths read from materialized views, not raw events.

Everything is a single Rust binary. No external dependencies beyond the LLM endpoints you wire up.

What's not in v0.5.1

Honesty about gaps, in case you're evaluating:

  • No macOS or Windows server builds yet. Linux only. Add -server to that sentence to be clear: this is for the box that's running your agent backend, not your dev laptop. (Use Docker on macOS/Windows for now — same binary, runs fine.)
  • No clustering in this release. Single-node is solid; multi-node Raft + gossip is in the next minor (v0.6). For most teams under ~10M events, single-node is fine.
  • No GUI installer. Tarball + systemd or Docker + compose, that's it. We'll add a one-line install script next month.

What's next

Roadmap items most users have been asking about:

  1. Helm chart + Kubernetes operator (v0.6)
  2. Cluster mode GA (v0.6)
  3. macOS native binary (when asked enough — currently 3 customer requests)
  4. Encryption-at-rest with customer-managed KMS keys (v0.7)
  5. An OSS eval harness — cortexdb-eval — that runs against any AI memory layer, not just ours

Get started

If you're running CortexDB on-prem, we'd love to hear what you're building. Reply on @cortexdbai or drop a note in Discord — we read everything.