Recall + LLM in one call. Returns answer text, citations, full provenance, and timings.

POST /v1/answer

Recall + LLM composed into one round-trip. Returns the LLM answer plus the same pack_id, provenance, and citations you'd get from /v1/recall/v1/answer is not an opaque endpoint; it's literally recall ⊕ LLM call.

For raw retrieval without the LLM step, use POST /v1/recall.

Capability

  • scope.read.local (always)
  • Traversal caps per view
  • llm.invoke
  • diagnostics.read when diagnostics ≠ "none"

Request

{
  "scope": "org:acme/dept:eng/user:alice",
  "view": "holistic",
  "question": "Did Acme renew?",
  "answer_model": "claude-opus-4-6",
  "answer_max_tokens": 1500,
  "include": ["beliefs", "facts", "episodes"],
  "temporal": { "natural": "last 30 days" },
  "budgets": { "max_tokens": 4000, "per_layer_limits": { "facts": 20, "beliefs": 10 } },
  "citation_mode": "inline_with_markers",
  "diagnostics": "summary",
  "use_pack_id": null,
  "stream": false
}
FieldTypeNotes
questionstringThe user's question.
answer_modelstringLLM identifier (provider-routed). Server-side default: claude-opus-4-6.
answer_max_tokensintOutput cap. Default 1500.
use_pack_idstring | nullSkip recall and reuse a previously generated pack (60 s TTL). When set, recall fields are ignored.
(other fields)Same semantics as /v1/recall.

Response

{
  "pack_id": "pack_01HX...",
  "answer": "Acme upgraded to 200 seats on 2026-05-13. They did not renew their previous contract; they signed a new one. [1][2]",
  "citations": [
    { "marker": "[1]", "layer": "fact",  "id": "fact_01HX...", "support_strength": 0.91 },
    { "marker": "[2]", "layer": "event", "id": "evt_01HX...",  "support_strength": 0.87 }
  ],
  "provenance": { "trail": [...], "citations": {...} },
  "diagnostics": {
    "recall_ms": 412,
    "llm_ms":    814,
    "total_ms":  1240,
    "pack_used": { "facts": 5, "beliefs": 2, "episodes": 1 },
    "answer_model":  "claude-opus-4-6",
    "answer_tokens": { "prompt": 1284, "completion": 218 }
  },
  "as_of": "2026-05-15T10:42:09Z"
}

Streaming — ?stream=true

Token-by-token streaming. Event sequence:

event: token         data: { "text": "Acme " }
event: token         data: { "text": "upgraded " }
...
event: citations     data: [ { "marker": "[1]", ... } ]
event: provenance    data: { "trail": [...], "citations": {...} }
event: diagnostics   data: { "recall_ms": 412, "llm_ms": 814, "total_ms": 1240 }
event: done          data: { "pack_id": "pack_01HX...", "as_of": "..." }

Returns X-Cortex-Stability: experimental until the token-stream UX is pinned.