Recall context and produce an LLM answer in a single round trip.
POST /v1/answer
Recall against the memory log, feed the assembled context through a type-routed system prompt, and return the LLM's answer along with cost and timing telemetry. This is the endpoint that powers question-answering on top of CortexDB.
For raw retrieval without an LLM step, use POST /v1/recall.
Request
POST /v1/answer
Content-Type: application/json
Authorization: Bearer <api-key>
{
"query": "When did we migrate the payments service off PostgreSQL?",
"question_type": "temporal-reasoning",
"question_date": "2026-05-14",
"max_recall_tokens": 8000,
"max_output_tokens": 1500,
"temperature": 0.0
}
| Field | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | The question to answer |
| tenant_id | string | No | Sub-tenant scope |
| namespace | string | No | Alias for tenant_id |
| question_type | string | No | One of single-session-user, single-session-assistant, single-session-preference, knowledge-update, temporal-reasoning, multi-session. Routes the answer prompt; unknown values fall through to the default prompt. |
| max_recall_tokens | integer | No | Recall budget before prompt assembly (default 8000) |
| max_output_tokens | integer | No | LLM response cap (default 1500) |
| temperature | float | No | Anchor temperature (default 0.0) |
| verify_temps | float[] | No | Extra temperatures for self-consistency voting. Default [0.4, 0.7] (3-way vote). Pass [] to disable verification. |
| verify_types | string[] | No | Restrict self-consistency voting to these question_type values. Pass ["*"] to always vote. |
| question_date | string | No | The date the client wants the model to treat as "today" (ISO-8601). Required for accurate temporal-reasoning answers. |
| include_context | boolean | No | Include the assembled recall context in the response (large; default false) |
| disable_ms_executor | boolean | No | Bypass the multi-session executor for this request |
Response
{
"answer": "The payments service migrated to CockroachDB on 2026-03-15. The cutover took ~7 weeks from the initial decision (2026-01-28).",
"question_type": "temporal-reasoning",
"context_length": 8742,
"recall_latency_ms": 2868,
"generation_latency_ms": 2545,
"model": "anthropic/claude-opus-4-6",
"tokens_input": 9608,
"tokens_output": 412,
"cost_usd": 0.1234
}
| Field | Type | Description |
|---|---|---|
| answer | string | Final LLM answer (after self-consistency vote, if enabled) |
| question_type | string | Echoed back; useful when the server falls back to the default prompt |
| context_length | integer | Character length of the recall context fed to the model |
| recall_latency_ms | integer | Time spent on retrieval + scoring + reranking |
| generation_latency_ms | integer | Time spent in the LLM call (sum across anchor + verify samples) |
| model | string | Provider-prefixed model id, e.g. anthropic/claude-opus-4-6 |
| tokens_input / tokens_output | integer | Token usage as reported by the router |
| cost_usd | float | USD cost as reported by the router; 0.0 when the router doesn't track cost |
| context | string? | Only present when include_context: true |
Defaults at the server
The default answer model in the canonical production configuration is Claude Opus 4.6 (per docs/REPRODUCE_LONGMEMEVAL_93_8.md). For cost-sensitive workloads, set CORTEX_ANSWER_MODEL=claude-sonnet-4-6 server-side — 92.8% accuracy on LongMemEval-S at roughly 4× lower cost.