Use Cases/Low-latency systems/Voice AI
Voice AI

Memory at the speed of speech.

Conversational voice agents have a ~200ms latency budget before they sound unnatural. CortexDB delivers full long-term recall inside it.

01 — Problem

Voice agents lose state between turns, forget the caller mid-conversation, and stall when fetching context from a vector store. Every extra hop is a perceived hesitation — and a customer who hangs up.

02 — What CortexDB does

Capabilities that map directly to the pain.

01

Sub-second recall

Six-phase pipeline returns enriched, reranked context in well under a second — the budget you need between an ASR finalization and a TTS first audio frame.

02

Speaker-scoped namespaces

Each caller, agent, or session gets its own tenant. Recall is bounded to the right speaker without scanning a shared index.

03

Streaming-friendly writes

No LLM on the write path. Drop a transcript chunk into CortexDB the moment the speaker finishes a thought — enrichment happens asynchronously.

04

Irrelevance detection

Quad-signal gate returns empty when nothing matches, so your agent says "I don't know" instead of hallucinating the caller's last name.

03 — In code

What the integration looks like.

voice_agent.py
python
# Inside the ASR -> LLM -> TTS loop
context = client.recall(
    query=user_utterance,
    tenant_id=caller_id,        # speaker scope
    timeout_ms=180,             # stay under the speech budget
)

if not context.results:
    # Quad-signal gate: nothing relevant. Don't bluff.
    reply = "I don't have that detail yet. Mind walking me through it?"
else:
    reply = llm.respond(user_utterance, context=context.text)

client.remember(
    content=f"User said: {user_utterance}",
    tenant_id=caller_id,
)
04 — Why CortexDB

The architectural decisions that matter here.

Embedded path option

Run CortexDB in-process for hot-path recall when the network is the budget.

Per-call audit trail

Every utterance is an immutable event. Replay a call exactly as the agent heard it.

Numbers
< 200ms
Recall p95
0% on irrelevant queries
Hallucination rate
Next step

Want to see this running on your data?