Six copy-paste configurations — Benchmark / Voice / Batch / Cost / Enterprise / Quickstart — for the most common CortexDB deployment shapes.
Profiles & Presets
Each profile below is a complete config you can copy into your environment. We mark every one as either:
- Benchmark-validated — we ran this exact configuration on a public benchmark and reproduced the published result.
- Principled — derived from the source code of the recall pipeline and our internal tuning, but not validated on a published benchmark. Likely good; not proven.
When in doubt, start with Self-host quickstart, then move toward whichever profile matches your workload as you understand it.
1. Benchmark-validated (LongMemEval-S, 93.8%)
Use this if you want to reproduce our published numbers or evaluate a head-to-head against another memory layer.
Status: Benchmark-validated. This is the exact configuration that scored 469/500 on LongMemEval-S as published in the benchmark paper.
<data_dir>/cortex.toml
[cluster]
node_id = 1
[storage]
[engine]
[network]
[llm]
[governance]
[scheduler]
enabled = false
Environment variables
export OPENAI_API_KEY=sk-... # for embeddings + extraction
export ANTHROPIC_API_KEY=sk-ant-... # for answer generation
export CORTEX_EMBEDDING_MODEL=text-embedding-3-small
export CORTEX_EMBEDDING_DIMS=1536
export CORTEX_ANSWER_PROVIDER=anthropic
export CORTEX_ANSWER_MODEL=claude-opus-4-6
What's intentional:
scheduler.enabled = false— background compaction merges chunks and emits summary-style entries (prefixedC:andP:) that pollute the top-K over long runs. On the 150-question bench (~100 min) this collapsed single-session-assistant recall to 0/25.text-embedding-3-small(1536 d) overtext-embedding-3-large(3072 d) — the larger model gives ~+0.4pp on LongMemEval-S at ~3× the cost; we shipped the small model as the published config because the win didn't justify the cost ratio.claude-opus-4-6overgpt-4o— Opus 4.6 was +6.2pp on multi-session reasoning in our internal A/B (the LongMemEval-S category that dominates total error).
Cost per 150-question run, defaults: roughly $1.20 in embeddings + $4.50 in Claude answer calls = ~$6.
2. Voice / Realtime (sub-100ms recall p50)
Use this if you're building a voice agent, a coding assistant doing many recalls per turn, or any agent where every millisecond shows up in user-perceived latency.
Status: Principled. We tuned this against the recall pipeline structure (every disabled stage removes a network round-trip or LLM call from the request path). Not benchmarked against a public latency dataset.
<data_dir>/cortex.toml
[cluster]
node_id = 1
[storage]
data_path = "/var/lib/cortexdb"
wal_sync = false # accept WAL durability gap for write latency
[engine]
hnsw_ef_search = 60 # default 100 — smaller = faster, ~0.5pp recall loss
block_cache_bytes = 17179869184 # 16 GB — keep more index pages hot
[network]
request_timeout_ms = 3000 # default 10000 — fail fast
[scheduler]
enabled = true
enrichment_drain_interval_secs = 5 # consume async results aggressively
Environment variables
# Same auth as default
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Smaller, faster embedding model
export CORTEX_EMBEDDING_MODEL=text-embedding-3-small
export CORTEX_EMBEDDING_DIMS=1536
# Skip the reranker (~80-200ms saved per recall)
export CORTEX_RERANKER_PROVIDER= # empty = disable
# Skip HyDE multiquery on hot paths
export CORTEX_HYDE_MULTIQUERY_DISABLED_TYPES=single-session-user,single-session-assistant,multi-session,open-domain
# Skip multihop planning (saves 1-3 LLM round-trips)
export CORTEX_MULTIHOP_QUERY_PLANNER_TYPES= # empty = disable for all
# Tighter graph retrieval
export CORTEX_GRAPH_RETRIEVAL_TOP_K=20
# Skip synchronous fact extraction on write — keep all extraction async
export CORTEX_SYNC_FACT_EXTRACT_DISABLE=1
export CORTEX_SYNC_GRAPH_SEED_DISABLE=1
Tradeoffs:
- Saved: Reranker (~80–200 ms), HyDE multiquery (~150–400 ms LLM call), multihop planner (~200–600 ms × N queries), synchronous fact extraction (~100–300 ms on the write path).
- Lost: ~1–3 percentage points of recall accuracy. The graph and reranker exist for a reason — they catch the long-tail of recall failures. Voice agents typically tolerate this because the user can re-ask.
- WAL durability:
wal_sync = falsemeans up to ~10 ms of writes can be lost on a hard crash. Acceptable for voice (the user just said it; if you lose it, they'll say it again). Not acceptable for financial/compliance workloads.
3. Batch / High Throughput
Use this if you're bulk-ingesting historical data, building a memory layer from a CRM dump, or running a nightly ETL of agent transcripts.
Status: Principled. Optimizes for writes-per-second over per-request latency.
<data_dir>/cortex.toml
[cluster]
node_id = 1
[storage]
data_path = "/var/lib/cortexdb"
wal_sync = true # keep durability — batch writes are usually one-shot
[engine]
block_cache_bytes = 34359738368 # 32 GB — large cache absorbs index churn
[scheduler]
enabled = true
# Stretch all intervals — compaction can wait until ingest pauses
compaction_interval_secs = 1800 # 30 min (default 5 min)
methylation_interval_secs = 3600 # 1 hour (default 10 min)
enrichment_drain_interval_secs = 60 # 1 min (default 30 s)
cognitive_persist_interval_secs = 600 # 10 min (default 1 min)
feedback_weight_interval_secs = 1800 # 30 min (default 2 min)
Environment variables
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Large embedding batches — pack each OpenAI call
export CORTEX_EMBEDDING_MAX_BATCH_ITEMS=4096 # default 2048
export CORTEX_EMBEDDING_RETRY_ATTEMPTS=3 # be patient with 429s
export CORTEX_EMBEDDING_RETRY_BASE_DELAY_MS=1000
# Skip the synchronous write-path stages — let the async pipeline catch up
export CORTEX_SYNC_FACT_EXTRACT_DISABLE=1
export CORTEX_SYNC_GRAPH_SEED_DISABLE=1
# Use the cheap fact-extractor model (writes don't need Opus)
export CORTEX_LLM_MODEL=gpt-4o-mini
# Use the bulk endpoint
# POST /v1/experience/bulk with up to 100 events per request
Tradeoffs:
- Saved: ~80% wall-clock on a 1M-event ingest by batching embeddings and deferring sync extraction.
- Lost: Recall on the first 30 minutes of ingested data is degraded until the async pipeline catches up (you'll see chunks indexed but facts not yet extracted). For batch ingest into a system that won't be queried during ingest, this is free; for live ingest into a hot system, it matters.
- API ratelimits:
CORTEX_EMBEDDING_MAX_BATCH_ITEMS=4096will push you against OpenAI's 1M-token-per-batch limit ontext-embedding-3-small. The client splits automatically when that happens, but you'll see retries; tune down if your logs fill with 413s.
4. Cost-Optimized (free-tier and prototypes)
Use this if you're a solo dev or doing exploration, and you want every API call to be the cheapest option that still works.
Status: Principled. We've used this internally for early prototypes; it's not benchmarked.
<data_dir>/cortex.toml
[cluster]
node_id = 1
[storage]
data_path = "./cortexdb_data"
[engine]
block_cache_bytes = 2147483648 # 2 GB — small enough to fit on a laptop
[scheduler]
enabled = true # keep on — keeps the index lean
Environment variables
export OPENAI_API_KEY=sk-... # cheapest provider for embeddings
# Cheapest embedding model
export CORTEX_EMBEDDING_MODEL=text-embedding-3-small # $0.02 per 1M tokens
export CORTEX_EMBEDDING_DIMS=1536
# Use the cheap GPT model for everything LLM-shaped
export CORTEX_LLM_MODEL=gpt-4o-mini # entity extraction
export CORTEX_ANSWER_PROVIDER=openai
export CORTEX_ANSWER_MODEL=gpt-4o-mini # answers ($0.15/1M in)
# Disable every optional pipeline stage
export CORTEX_LLM_DISABLE= # leave on — needed for KG
export CORTEX_ENRICHMENT_MODEL= # leave empty = disabled
export CORTEX_RERANKER_PROVIDER= # disable reranker (no cost saved unless cohere)
export CORTEX_HYDE_PASSAGES_MS=1 # minimum HyDE passages
export CORTEX_MULTIHOP_QUERY_COUNT=2 # fewer multihop queries (default 4)
export CORTEX_ANSWER_USE_VERIFIER= # disable answer verifier
export CORTEX_ANSWER_VERIFIER_TYPES= # empty = no verifier on any type
Estimated monthly cost for a hobby project (100 K stored memories, 1 K recalls/day):
- Embeddings: ~$0.50/month (one-time + occasional re-embed)
- Entity extraction (gpt-4o-mini): ~$2/month
- Answer (gpt-4o-mini): ~$3/month
- Total: ~$6/month. Pricier than running Ollama locally; cheaper than Mem0/Zep.
To go truly free (no API costs), swap the embedding service to a local Ollama:
export CORTEX_EMBEDDING_URL=http://localhost:11434
export CORTEX_EMBEDDING_MODEL=nomic-embed-text
export CORTEX_EMBEDDING_DIMS=768
# Set engine.vector_dimensions = 768 in cortex.toml to match
Expect ~5pp recall loss vs OpenAI embeddings — fine for prototypes.
5. Enterprise / Compliance
Use this if you're deploying into a regulated environment: HIPAA, SOC 2 Type II, GDPR, PCI, or any context where the auditor asks "where is the data, who can read it, and prove it."
Status: Principled — the security/compliance schema lights up the controls auditors look for. We have not formally certified against any specific regime.
<data_dir>/cortex.toml
[cluster]
node_id = 1
replication_factor = 3
[storage]
data_path = "/data/cortex"
wal_sync = true
[engine]
hnsw_quantization = "ScalarU8" # save memory; encrypt-at-rest handles the rest
[network]
api_port = 3141
gossip_port = 7000
grpc_port = 9042
[governance]
default_retention_ttl_secs = 2592000 # 30 days — auto-expire raw events
max_retention_secs = 220752000 # 7 years (default)
pii_detection = true
pii_handling = "Block" # reject events if PII scanner is down
audit_logging = true
[security]
[security.encryption]
enabled = true
key_file = "/etc/cortexdb/keys/master.key"
key_rotation_interval_secs = 7776000 # 90 days
blob_store_sse_kms = true
[security.tls]
api_tls_enabled = true
cert_path = "/etc/cortexdb/tls/cert.pem"
key_path = "/etc/cortexdb/tls/key.pem"
ca_cert_path = "/etc/cortexdb/tls/ca.pem"
mtls_enabled = true # require client certs on internal RPC
min_tls_version = "1.3"
[security.rbac]
enabled = true
default_role = "reader"
oidc_issuer = "https://login.acme.com/"
oidc_audience = "cortexdb"
require_mfa = true
[security.rate_limit]
enabled = true
default_rpm = 600 # 10 RPS per actor
default_rpd = 50000
burst = 20
[security.breach_detection]
enabled = true
max_failed_auth = 5
auth_window_secs = 300
lockout_duration_secs = 3600
[blob_store]
provider = "s3"
bucket = "acme-cortex-blobs"
region = "us-east-1"
s3_encryption_type = "aws:kms"
s3_kms_key_id = "arn:aws:kms:us-east-1:123:key/abc"
s3_bucket_key_enabled = true
[compliance]
[compliance.data_residency]
enabled = true
allowed_regions = ["us-east-1", "us-west-2"] # block writes to other regions
[compliance.consent]
require_consent = true
default_purposes = ["customer_support"]
[compliance.classification]
auto_classify = true
default_sensitivity = "internal"
auto_redact_above = "confidential"
[compliance.dsar]
enabled = true # /v1/erasures honored
[compliance.siem]
enabled = true
format = "cef"
webhook_urls = ["https://siem.acme.com/ingest"]
batch_size = 100
flush_interval_secs = 30
[deployment]
profile = "enterprise"
Environment variables
# Use customer-managed keys for everything that supports it
export CORTEX_EMBEDDING_API_KEY=... # separate from OPENAI_API_KEY
export CORTEX_ANSWER_API_KEY=...
export CORTEX_VERIFIER_API_KEY=...
# Audit-verify every answer (slower but defensible)
export CORTEX_ANSWER_USE_VERIFIER=1
export CORTEX_ANSWER_VERIFIER_TYPES=single-session-user,single-session-assistant,multi-session,open-domain
export CORTEX_VERIFIER_MODEL=gpt-4.1
# Log in JSON so your SIEM can parse without grok
export CORTEX_LOG_FORMAT=json
What you get: Encryption-at-rest with key rotation, TLS 1.3 + mTLS on RPC, OIDC with MFA, per-actor rate limits, audit log on every event, automatic PII detection (blocking on scanner failure), 30-day default retention with auto-expire, S3 with KMS encryption, data-residency enforcement, SIEM forwarding in CEF format, DSAR /v1/erasures endpoint enabled.
See Security & Compliance for a per-field walkthrough.
6. Self-host quickstart (5-min Docker)
Use this if you just want to try CortexDB and decide later whether to tune it.
Status: Validated — this is the configuration in the self-host blog post and what docker run cortexdb/cortexdb:latest gives you out of the box.
docker run -d \
--name cortexdb \
-p 3141:3141 \
-v cortexdb-data:/data \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
cortexdb/cortexdb:latest
That's the entire configuration. The container ships with zero cortex.toml overrides — every field is at its compiled default. You get:
- Single-node mode, v1 API on
:3141, data persisted to the Docker volume - OpenAI
text-embedding-3-small(1536 d),gpt-4o-minifor entity extraction - Anthropic
claude-opus-4-6for answer generation - Background scheduler on (5-min compaction, 10-min methylation, 30-sec async drain)
- Encryption off, TLS off, RBAC off — this is for evaluation only
To deploy this for real, layer on at least:
- A real volume mount, not a Docker named volume (your data shouldn't disappear if you
docker rm) - TLS termination in front (caddy / nginx / cloudflare tunnel)
- A backup of
/data/cortexdb_datasomewhere - The Enterprise profile config block if you have any compliance obligations
Picking between profiles
The decision tree:
Are you reproducing a published benchmark?
├─ Yes → Profile 1 (Benchmark-validated)
└─ No → Are you in a regulated industry?
├─ Yes → Profile 5 (Enterprise) — start here, layer on tuning later
└─ No → What dominates your workload?
├─ Latency per request → Profile 2 (Voice/Realtime)
├─ Bulk ingest throughput → Profile 3 (Batch)
├─ API cost → Profile 4 (Cost-Optimized)
└─ Just evaluating → Profile 6 (Quickstart)
Most production deployments end up as Profile 5 (Enterprise) + selective tuning from Profile 2 (Voice/Realtime) on the recall hot path. That's the configuration we recommend for production agents serving real users.
Next steps
- Configuration Foundations — the precedence rules and reload semantics
- Recall Tuning — the recall-side knobs in detail
- Benchmarking — full reproduction steps for the LongMemEval-S number