Recall context and produce an LLM answer in a single round trip.

POST /v1/answer

Recall against the memory log, feed the assembled context through a type-routed system prompt, and return the LLM's answer along with cost and timing telemetry. This is the endpoint that powers question-answering on top of CortexDB.

For raw retrieval without an LLM step, use POST /v1/recall.

Request

POST /v1/answer
Content-Type: application/json
Authorization: Bearer <api-key>
{
  "query": "When did we migrate the payments service off PostgreSQL?",
  "question_type": "temporal-reasoning",
  "question_date": "2026-05-14",
  "max_recall_tokens": 8000,
  "max_output_tokens": 1500,
  "temperature": 0.0
}

| Field | Type | Required | Description | |---|---|---|---| | query | string | Yes | The question to answer | | tenant_id | string | No | Sub-tenant scope | | namespace | string | No | Alias for tenant_id | | question_type | string | No | One of single-session-user, single-session-assistant, single-session-preference, knowledge-update, temporal-reasoning, multi-session. Routes the answer prompt; unknown values fall through to the default prompt. | | max_recall_tokens | integer | No | Recall budget before prompt assembly (default 8000) | | max_output_tokens | integer | No | LLM response cap (default 1500) | | temperature | float | No | Anchor temperature (default 0.0) | | verify_temps | float[] | No | Extra temperatures for self-consistency voting. Default [0.4, 0.7] (3-way vote). Pass [] to disable verification. | | verify_types | string[] | No | Restrict self-consistency voting to these question_type values. Pass ["*"] to always vote. | | question_date | string | No | The date the client wants the model to treat as "today" (ISO-8601). Required for accurate temporal-reasoning answers. | | include_context | boolean | No | Include the assembled recall context in the response (large; default false) | | disable_ms_executor | boolean | No | Bypass the multi-session executor for this request |

Response

{
  "answer": "The payments service migrated to CockroachDB on 2026-03-15. The cutover took ~7 weeks from the initial decision (2026-01-28).",
  "question_type": "temporal-reasoning",
  "context_length": 8742,
  "recall_latency_ms": 2868,
  "generation_latency_ms": 2545,
  "model": "anthropic/claude-opus-4-6",
  "tokens_input": 9608,
  "tokens_output": 412,
  "cost_usd": 0.1234
}

| Field | Type | Description | |---|---|---| | answer | string | Final LLM answer (after self-consistency vote, if enabled) | | question_type | string | Echoed back; useful when the server falls back to the default prompt | | context_length | integer | Character length of the recall context fed to the model | | recall_latency_ms | integer | Time spent on retrieval + scoring + reranking | | generation_latency_ms | integer | Time spent in the LLM call (sum across anchor + verify samples) | | model | string | Provider-prefixed model id, e.g. anthropic/claude-opus-4-6 | | tokens_input / tokens_output | integer | Token usage as reported by the router | | cost_usd | float | USD cost as reported by the router; 0.0 when the router doesn't track cost | | context | string? | Only present when include_context: true |

Defaults at the server

The default answer model in the canonical production configuration is Claude Opus 4.6 (per docs/REPRODUCE_LONGMEMEVAL_93_8.md). For cost-sensitive workloads, set CORTEX_ANSWER_MODEL=claude-sonnet-4-6 server-side — 92.8% accuracy on LongMemEval-S at roughly 4× lower cost.