Every recall-strategy knob — graph retrieval, HyDE, multihop, salience, reranker, and the constants that aren't (yet) configurable.

Recall Tuning

CortexDB's recall pipeline runs six retrieval channels in parallel and fuses them with reciprocal rank fusion. About a dozen env vars expose knobs into that pipeline; another two dozen are compiled constants that we tuned against LongMemEval-S and LoCoMo and didn't expose.

This page covers what's tunable and when to touch it. Most deployments shouldn't tune any of these — the defaults match the published 93.8% number.

The pipeline at a glance

Query
  │
  ├─► Query routing  ─────────────────────────► question_type
  │     ('single-session-user', 'multi-session', ...)
  │
  ├─► (optional) HyDE multiquery expansion ──► N hypothesized passages
  │                                              embedded → query vectors
  │
  ├─► (optional) Multihop query planner ─────► M follow-up queries
  │                                              generated by LLM
  │
  ├──► Run K parallel retrieval channels:
  │      • Vector (HNSW)
  │      • Fulltext (BM25 + WordNet)
  │      • Entity-name (exact / fuzzy)
  │      • Synonym
  │      • Graph BFS (KG edges around seed entities)
  │      • Temporal (recency window + decay)
  │
  ├──► Reciprocal rank fusion (RRF, k=60)  ──► fused candidate list
  │
  ├──► Cross-encoder rerank (optional, ~25-40 candidates) ──► reranked top-K
  │
  └──► Build response pack (citations, beliefs, episodes)

Each stage has knobs. Skipping a stage saves latency at the cost of recall accuracy. The defaults keep every stage on — the published benchmark numbers depend on the full pipeline.

Graph retrieval

The KG channel walks edges around entities mentioned in the query.

Env var	Default	What it controls
`CORTEX_GRAPH_RETRIEVAL_DISABLE`	(unset)	Set `=1` to skip the graph channel entirely. ~-3-5 pp on multi-session in our A/B.
`CORTEX_GRAPH_RETRIEVAL_TOP_K`	`40` (single-session) / `120` (multi-session)	Number of graph-derived candidates passed to fusion.

When to change:

Disable graph retrieval (CORTEX_GRAPH_RETRIEVAL_DISABLE=1) if you're on the voice/realtime hot path and willing to trade -3 pp for ~150 ms.
Bump TOP_K to 80 (single) / 240 (multi) for query types where you expect entity-rich answers (e.g. "what did X say about Y at Z time"). Diminishing returns past these values.

Constants you cannot currently override (in cortex-coordinator/src/recall.rs):

GRAPH_RETRIEVAL_MAX_ENTITIES = 48 (cap on KG entities to walk from)
GRAPH_RETRIEVAL_MAX_EDGES = 512 (cap on edges to traverse per walk)
GRAPH_RETRIEVAL_MAX_EPISODES = 256 (cap on episodes pulled by walk)
GRAPH_WEIGHT = 0.20 (graph channel's contribution to fusion)

HyDE multiquery expansion

HyDE (Hypothetical Document Embeddings) asks an LLM to write a hypothetical passage that would answer the query, then embeds that passage instead of (or in addition to) the literal query. Captures meaning even when the query and the stored memory use very different words.

Env var	Default	What it controls
`CORTEX_HYDE_PASSAGES_MS`	`1`	Number of hypothetical passages to generate for multi-session queries. Set 0 to disable HyDE for multi-session.
`CORTEX_HYDE_MULTIQUERY_DISABLED_TYPES`	`multi-session,open-domain`	Comma-separated question types where HyDE is off.

When to change:

Set CORTEX_HYDE_MULTIQUERY_DISABLED_TYPES=single-session-user,single-session-assistant,multi-session,open-domain to disable HyDE entirely. Saves one LLM round-trip (~150-400 ms). Lose ~1-2 pp on the queries where stored phrasing doesn't match query phrasing.
Bump CORTEX_HYDE_PASSAGES_MS=3 to generate three hypothetical passages with temperatures [0.3, 0.6, 0.9], giving wider semantic coverage. Triples the HyDE LLM cost.

Compiled constants:

HYDE_MS_TEMPERATURES = [0.3, 0.6, 0.9] — the temperature schedule for multi-passage HyDE
DEFAULT_HYDE_MULTIQUERY_DISABLED_TYPES = ["multi-session", "open-domain"] — overridable via the env var above

Multihop query planner

For complex queries, an LLM plans 2-N follow-up queries that explore related angles, then runs all of them through retrieval.

Env var	Default	What it controls
`CORTEX_MULTIHOP_QUERY_PLANNER_DISABLE`	(unset)	Set `=1` to disable multihop entirely.
`CORTEX_MULTIHOP_QUERY_PLANNER_TYPES`	`multi-session,open-domain`	Comma-separated types where multihop runs.
`CORTEX_MULTIHOP_QUERY_COUNT`	`4`	Number of follow-up queries the planner generates.
`CORTEX_MULTIHOP_MAX_QUERY_FANOUT`	`5`	Cap on simultaneously executing planned queries.
`CORTEX_MULTIHOP_COVERAGE_ORDER_DISABLE`	(unset)	Set `=1` to use original LLM-emitted order instead of coverage-optimal reordering.

When to change:

Set CORTEX_MULTIHOP_QUERY_PLANNER_TYPES= (empty) for latency-sensitive deployments. Saves 1-3 LLM round-trips per recall. Loses ~1-3 pp on complex multi-session queries.
Lower CORTEX_MULTIHOP_QUERY_COUNT=2 for a middle ground: keeps the planner but reduces its fanout.
Higher CORTEX_MULTIHOP_QUERY_COUNT=6 and CORTEX_MULTIHOP_MAX_QUERY_FANOUT=8 for offline question-answering where wall-clock doesn't matter.

Salience prior

Salience is a per-memory importance score, updated by access patterns over time. The recall ranker can incorporate it as a prior — recently-accessed memories get a small boost.

Env var	Default	What it controls
`CORTEX_SALIENCE_WEIGHT`	`0.10`	Weight of salience score in final ranking. Range `[0.0, 1.0]`.
`CORTEX_AUTO_ROUTE`	(unset)	Set `=1` to let the router pick per-query-type salience weights automatically.

When to change:

For "what's recently relevant" agents (companion bots, daily assistant), bump CORTEX_SALIENCE_WEIGHT=0.20 to lean harder on recency.
For historical-archive workloads (legal discovery, CRM search), set CORTEX_SALIENCE_WEIGHT=0.0 — the right answer might be from years ago, not last week.

Entity-vector seeding

Hybrid signal: take the query's entity mentions, look up their canonical vectors, use those as additional query vectors.

Env var	Default	What it controls
`CORTEX_ENTITY_VECTOR_SEED_ENABLE`	(unset)	Set `=1` to enable. Off by default.
`CORTEX_VECTOR_TENANT_OVERFETCH`	`1.25`	Tenant-aware overfetch multiplier for the vector channel — fetch 25% more candidates so post-filter on scope still hits the target K.

Compiled constants for entity-vector seeding:

ENTITY_VECTOR_SEED_TOP_K = 10 (vectors per entity)
ENTITY_VECTOR_SEED_MIN_SIMILARITY = 0.40 (cosine floor)
ENTITY_VECTOR_SPAN_LIMIT = 5 (entities per query)
ENTITY_VECTOR_PER_SPAN_TOP_K = 5 (vectors per entity span)

When to enable: queries with named entities that don't match stored phrasing exactly ("the customer in Boston" vs stored "Acme HQ in MA"). Adds one HNSW lookup per entity span.

Reranker

A cross-encoder model that takes the top ~25-40 fused candidates and re-scores each (query, candidate) pair as a unit, producing a final ranking.

Env var	Default	What it controls
`CORTEX_RERANKER_PROVIDER`	(empty = disabled)	`cohere` or `local`. Empty disables the reranker.
`CORTEX_RERANKER_MODEL`	`rerank-v3.5` (Cohere)	Model name.
`CORTEX_RERANKER_MODEL_PATH`	(none)	Path to a local ONNX model when `PROVIDER=local`.

When to change:

Enable CORTEX_RERANKER_PROVIDER=cohere + COHERE_API_KEY=... for ~+2 pp on noisy recall sets (mixed-corpus, conversational queries). Costs ~$0.001 per recall. Adds ~80-200 ms.
Use CORTEX_RERANKER_PROVIDER=local + CORTEX_RERANKER_MODEL_PATH=/models/bge-reranker.onnx for self-contained deployments. Slower (~200-500 ms on CPU) but free.
Leave disabled for voice / sub-100ms paths.

Question-type executor switches

The /v1/answer endpoint routes queries through type-specific executors. These flags disable specific paths for A/B testing or to work around bugs.

Env var	Default	What it controls
`CORTEX_MS_EXECUTOR_DISABLE`	(unset)	Disable the multi-session-optimized executor. Falls back to single-session path.
`CORTEX_MS_EVIDENCE_PACK_DISABLE`	(unset)	Skip evidence-pack assembly for multi-session. Smaller answers, less context.
`CORTEX_MS_COUNT_RELEVANCE_ENABLE`	(unset)	Enable count-based relevance scoring for multi-session (experimental).
`CORTEX_MS_STAGE_C_USE_VERIFIER`	(unset)	Use the verifier model in stage-C answer formatting for multi-session.
`CORTEX_MS_RETRY_ON_ABSTAIN`	(unset)	Retry recall when multi-session executor abstains.
`CORTEX_DIRECT_LOOKUP_RETRY_ON_ABSTAIN`	(unset)	Retry direct-lookup queries on abstain.
`CORTEX_TEMPORAL_EXTRACT_DISABLE`	(unset)	Disable temporal-phrase extraction (relative dates, durations).
`CORTEX_COMPOSITIONAL_ENABLE`	(unset)	Enable typed-arithmetic compositional answers (experimental).
`CORTEX_ENUMERATE_COUNT_ENABLE`	(unset)	Enable count-enumeration answers ("how many X are there").
`CORTEX_SESSION_LEVEL_EXPANSION_ENABLE`	(unset)	Expand recall context at session level rather than per-message.
`CORTEX_ANSWER_SHAPE_EXECUTOR_USE_VERIFIER`	(unset)	Use verifier in shape-aware executor.
`CORTEX_FACT_EVENT_PROMOTION_ENABLE`	(unset)	Promote facts to event-level relevance.
`CORTEX_FACT_VALIDITY_FILTER`	(unset)	Filter recalled facts by validity windows (bi-temporal).

Default operator stance: don't touch any of these. They exist for our benchmark tooling to validate routing decisions. The compiled defaults are the configuration that produced 93.8% on LongMemEval-S.

If you're debugging a specific recall failure ("multi-session is hallucinating") and have an A/B harness, flipping CORTEX_MS_EVIDENCE_PACK_DISABLE=1 is a reasonable diagnostic to confirm the evidence pack is the culprit.

Synchronous-write kill switches

These don't affect recall directly — they affect the write path, which in turn affects when recall has the data available.

Env var	Default	What it controls
`CORTEX_SYNC_FACT_EXTRACT_DISABLE`	(unset)	Skip synchronous fact extraction on write. All extraction goes async. Faster writes; recall lags.
`CORTEX_SYNC_FACT_EXTRACT_MAX_SESSIONS`	(no cap)	Cap on sessions processed sync per batch.
`CORTEX_SYNC_GRAPH_SEED_DISABLE`	(unset)	Skip synchronous graph seeding. Same tradeoff.
`CORTEX_SYNC_GRAPH_SEED_MAX_BULK_MEMORIES`	(no cap)	Cap on memories processed in sync graph seeding.
`CORTEX_SYNC_GRAPH_SEED_MAX_ENTITIES`	(no cap)	Cap on entities processed in sync graph seeding.

When to disable: batch ingest (see the Batch profile) or any workload where writes outnumber reads and the recall layer can lag by 30-60 s without breaking the user experience.

Memory evolution (methylation + consolidation)

Background-scheduled jobs that prune low-utility memories and consolidate related ones.

Env var	Default	What it controls
`CORTEX_METHYLATION_INACTIVITY_HOURS`	`168` (7 days)	Memories unaccessed for this long are eligible for pruning.
`CORTEX_METHYLATION_MIN_ACCESS`	`10`	Min access count before a memory becomes pruning-eligible.
`CORTEX_METHYLATION_MIN_UTIL_RATIO`	`0.30`	Min utility-to-access ratio. Below this = pruning candidate.
`CORTEX_METHYLATION_MIN_ACCESSES_FOR_RATIO`	`5`	Min accesses required before the ratio is even evaluated.
`CORTEX_CONSOLIDATION_MIN_MEMORIES`	`2`	Min memories about the same entity to trigger consolidation.
`CORTEX_CONSOLIDATION_MAX_BATCH`	`10`	Max consolidations per scheduler tick.
`CORTEX_CONSOLIDATION_MIN_AGE_HOURS`	`24`	Memories younger than this aren't consolidated (let them stabilize first).
`CORTEX_CONSOLIDATION_MAX_SURPRISE`	`0.5`	Don't consolidate memories above this surprise score (they're outliers worth keeping atomic).

When to change:

For dense / chatty agents (lots of low-value chitchat events), tighten methylation: CORTEX_METHYLATION_INACTIVITY_HOURS=72 (3 days). Prunes more aggressively.
For archival workloads, loosen: CORTEX_METHYLATION_INACTIVITY_HOURS=720 (30 days) so quarterly-relevant memories don't get pruned during off-quarters.
For benchmark runs, set CORTEX_SCHEDULER_DISABLE=1 to disable the whole scheduler — methylation and consolidation will not run, ensuring stable recall over long evals.

Compiled constants you can't (yet) override

These live in cortex-coordinator/src/recall.rs as const. They were tuned against LongMemEval-S + LoCoMo. If you need to override one for an unusual workload, file an issue — we may promote it to an env var.

Constant	Value	What it controls
`RETRIEVAL_TOP_K`	`40`	Candidates from each retrieval channel for single-session queries.
`RETRIEVAL_TOP_K_MS`	`160`	Same, for multi-session queries (wider pool).
`RERANK_POOL`	`25`	Top-N passed to the reranker (single-session).
`RERANK_POOL_MS`	`40`	Same, multi-session.
`RRF_K`	`60.0`	Smoothing constant in the reciprocal-rank-fusion formula.
`GRAPH_WEIGHT`	`0.20`	Graph channel's weight in the fused score.
`GRAPH_RETRIEVAL_MAX_ENTITIES`	`48`	Cap on KG entities to walk from per query.
`GRAPH_RETRIEVAL_MAX_EDGES`	`512`	Cap on edges per BFS walk.
`GRAPH_RETRIEVAL_MAX_EPISODES`	`256`	Cap on episodes pulled.
`ENTITY_VECTOR_SEED_TOP_K`	`10`	Vectors per entity for entity-vector seeding.
`ENTITY_VECTOR_SEED_MIN_SIMILARITY`	`0.40`	Cosine floor for entity-vector seed candidates.
`ENTITY_VECTOR_SPAN_LIMIT`	`5`	Max entities considered per query.
`ENTITY_VECTOR_PER_SPAN_TOP_K`	`5`	Vectors per entity span.
`HYDE_PASSAGES_MS_DEFAULT`	`1`	Default HyDE passages for multi-session (matches env default).
`SESSION_BALANCE_ENABLED_TYPES`	`["multi-session"]`	Question types where per-session candidate balancing is on.

Latency budget breakdown (default config)

Rough p50 budget for a single /v1/recall against a ~100K event scope:

Stage	p50 latency	Optional?
Query embedding	~50 ms	No
HyDE multiquery (if enabled)	~250 ms	Yes — disable per-type
Multihop planner (if enabled)	~400 ms	Yes — disable per-type
Vector + fulltext + KG retrieval (parallel)	~80 ms	No
RRF fusion	~2 ms	No
Cross-encoder rerank (if enabled)	~150 ms	Yes
Response pack assembly	~30 ms	No
Total (default config, multi-session query)	~900 ms
Total (voice profile, single-session query)	~180 ms

The HyDE and multihop costs dominate for any query type where they run. Disabling them is the single highest-leverage latency win for voice/realtime.

Next steps

Profiles & Presets — see the Voice profile for a complete sub-100ms config
Embeddings — vector dim and model choice that feeds this pipeline
Benchmarking — how the LongMemEval-S number was produced with all of these defaults