Memory at the speed of speech.
Conversational voice agents have a ~200ms latency budget before they sound unnatural. CortexDB delivers full long-term recall inside it.
Voice agents lose state between turns, forget the caller mid-conversation, and stall when fetching context from a vector store. Every extra hop is a perceived hesitation — and a customer who hangs up.
Capabilities that map directly to the pain.
Sub-second recall
Six-phase pipeline returns enriched, reranked context in well under a second — the budget you need between an ASR finalization and a TTS first audio frame.
Speaker-scoped namespaces
Each caller, agent, or session gets its own tenant. Recall is bounded to the right speaker without scanning a shared index.
Streaming-friendly writes
No LLM on the write path. Drop a transcript chunk into CortexDB the moment the speaker finishes a thought — enrichment happens asynchronously.
Irrelevance detection
Quad-signal gate returns empty when nothing matches, so your agent says "I don't know" instead of hallucinating the caller's last name.
What the integration looks like.
# Inside the ASR -> LLM -> TTS loop
context = client.recall(
query=user_utterance,
tenant_id=caller_id, # speaker scope
timeout_ms=180, # stay under the speech budget
)
if not context.results:
# Quad-signal gate: nothing relevant. Don't bluff.
reply = "I don't have that detail yet. Mind walking me through it?"
else:
reply = llm.respond(user_utterance, context=context.text)
client.remember(
content=f"User said: {user_utterance}",
tenant_id=caller_id,
)The architectural decisions that matter here.
Embedded path option
Run CortexDB in-process for hot-path recall when the network is the budget.
Per-call audit trail
Every utterance is an immutable event. Replay a call exactly as the agent heard it.
Want to see this running on your data?
Determinism for the order book.
Tick data and order events need predictable storage latency — not a query planner that gets cute under load.
Decisioning before the bid window closes.
Real-time bidding gives you ~10ms after the auction call. Profile lookups can't be a round trip.
High-cardinality storage that doesn't fold under write pressure.
Metrics, traces, and logs at modern scale crush row stores. CortexDB is built for the append-heavy, time-bounded shape of telemetry.