CortexDB V1: The Experience Layer for AI Agents

The missing third layer

Today's AI agents are amnesiacs. Open ChatGPT, tell it your name, log out, come back tomorrow — it has no idea who you are. For a chatbot that's annoying. For an agent — one that books travel, manages a customer pipeline, debugs production systems — it's broken.

People have tried to paper over this by stuffing more text into the prompt. The prompt is a clipboard, not a memory. You can't enforce "delete this customer's data" on a clipboard. You can't audit who saw what. You can't answer why do you think the deal is at risk? from a clipboard.

A real AI agent needs three layers, stacked together:

Layer 1

Intelligence

The LLM: reasoning, language, planning. Built by OpenAI, Anthropic, Google, Meta.

Solved — commoditizing fast.

Layer 2

Knowledge

RAG over docs: what's true about the world. Pinecone, Weaviate, vector pipelines.

Solved — becoming standard infra.

Layer 3

Experience

What this agent has lived through. Events, episodes, facts, beliefs — specific to this user, this org.

Missing. This is what we build.

Layers 1 and 2 are commodities. Layer 3 is the moat — and Layer 3 is what CortexDB builds.

Memory isn't storage. It's a cycle.

Real memory — the kind your own brain runs — isn't a single operation. It's a continuous cycle of operations the brain performs, over and over. Skip any one and you're back to a vector DB.

Capture  ──▶  Extract  ──▶  Reconcile  ──▶  Forget  ──▶  Consolidate
 (WAL)       (facts +       (bi-temporal     (lifecycle    ("sleep" —
              entities)      supersession)    methylation)   the brain)

Capture. Ingest raw experience. Conversations, events, observations — preserved exactly as they happened in a tamper-proof WAL.
Extract. Pull facts, entities, relationships out of raw events. Turn experience into structured knowledge.
Reconcile. When memories contradict — Bob said yes Tuesday, no Friday — decide what's true now. Newest wins, with the older version preserved as true until Tuesday.
Forget. Prune the irrelevant. Not all memories matter. The brain forgets on purpose.
Consolidate. When the agent "sleeps," build a coherent worldview from raw events. Synthesis. Generalization. Procedure formation. This is where memory becomes intelligent.

V1 ships the first four stages as stable surface. Consolidation is in flight as a beta endpoint — POST /v1/understanding/synthesize.

The five layers

If the cycle is what memory does, the layers are what memory contains. Five layers, stacked from raw to refined, each addressable through its own API endpoint:

Events

"14:22 — message received from [email protected]"

Immutable, atomic captures. The WAL is the source of truth.

Episodes

"The Acme deal — May 2 to May 13"

Bounded spans of related events. Sealed once consolidated.

Facts

"(ent_acme, deal_stage, signed)" — valid 2026-05-13 → now

Bi-temporal triples. Supersedable. Older versions preserved.

Beliefs

"Acme is likely to renew · confidence 0.62"

Probabilistic claims with a supports[] graph. Walkable evidence trail.

Understanding

"Concept: Q3 renewal motion — version 7"

Synthesized concepts that span many beliefs and episodes.

A system with only Events is a log. A system with only Facts is a database. A system with all five layers, with the right relationships between them, is the experience layer that completes the agent.

Almost no AI memory product handles Beliefs properly. They have facts. They don't track confidence or evidence. They can't answer "why do you think that?" — and that question is one of the biggest reasons enterprise buyers don't trust agents yet. CortexDB can: GET /v1/beliefs/why?belief_id=… returns the full support graph plus a narrative rendering.

The benchmarks

Anyone can claim great AI memory. There are exactly two public, standardized benchmarks the research community has rallied around — LongMemEval-S and LoCoMo — and V1 hits state-of-the-art on both.

LongMemEval-S

ICLR 2025 — 500 questions, six memory-skill categories.

93.8%

469 / 500 · server parity

single-session-assistant56 / 56100.0%

knowledge-update76 / 7897.4%

single-session-user67 / 7095.7%

single-session-preference28 / 3093.3%

temporal-reasoning122 / 13391.7%

multi-session120 / 13390.2%

Single run, no retry targeting, no gold-oracle leakage. Claude Opus 4.6 answerer · OpenAI text-embedding-3-small · Cohere rerank-v3.5 cross-encoder. Total wall clock 2h 2m, total cost $49.69. Reproducible from one command — see benchmarks/longmemeval/RESULTS.md in the cortex repo.

LoCoMo

1,540 question-answer pairs across long conversations.

86.9%

1,339 / 1,540 · cats 1–4

Cat 4 — Single-hop770 / 84191.6%

Cat 2 — Temporal282 / 32187.9%

Cat 1 — Multi-hop225 / 28279.8%

Cat 3 — Open-domain62 / 9664.6%

Six weeks ago we scored 76.4% on this benchmark. We're now at 86.9% — +10.6 points in six weeks. Mem0's published April 2026 LoCoMo score is 91.6%; we trail them by ~5pp on this benchmark while leading on LongMemEval-S. The next round of recall-side work is focused here.

The combined trajectory: +17.4 points on LongMemEval-S and +10.6 points on LoCoMo in six weeks of focused engineering. Velocity is the second-order signal of architecture quality — most teams in this space spent six weeks integrating yet another reranker. We've been rebuilding the foundation.

Why nobody else is doing this

The market has been arguing about which existing piece is the answer to AI memory. None of them is — alone or together.

RAG is a library

Handles knowledge (Layer 2). Reference, not memory.

Tools are phone calls

Stateless. Transactional. Each call is a stranger.

Vector DBs are memory pools

Similarity, not meaning. No reconciliation. No forgetting.

Knowledge graphs are filing cabinets

Static. They won't update themselves. Need a librarian.

Stitching these together with prompts and orchestration code is what every team is doing today — and it's why no AI yet feels like it actually knows you. Trying to build a brain by gluing a search engine to a Rolodex to a filing cabinet to a phone book.

The peer attempts make the same architectural mistake — they build memory as a feature on top of a vector store. Mem0 stores LLM-rewritten snippets and ranks them at recall time; they can't tell you why a result ranked where it did because the derivation isn't stored. Zep separates facts from episodes but folds Belief into Fact, losing the distinction between what I observed and what I concluded. Letta has tiered memory but no explicit Belief layer at all. We've broken those layers apart on purpose — because the trail back to evidence is the entire point.

What V1 ships

The full V1 surface is about ~30 stable endpoints plus an SSE channel. The headline pieces:

POST /v1/experience

The single unified write. One envelope shape covers conversation turns, documents, tool results, observations, blobs, and direct triples. Idempotency keys, optional ?wait= modes, async by default.

POST /v1/recall

Returns a stratified pack across all five layers, with a synthesized context_block, full provenance, and per-layer citations. Hybrid retrieval: HNSW vector + Tantivy BM25 + RRF fusion + optional Cohere rerank-v3.5.

POST /v1/answer

Recall composed with an LLM call. Returns the answer plus the same pack_id, provenance, and citations — so the system is never asking you to trust the answer blind.

Hierarchical scopes

org:acme/dept:eng/user:alice — one namespace primitive replacing tenant_id + namespace + workspace. Recall walks up (holistic) or down (descend) the hierarchy on demand.

Bi-temporal everywhere

Every derived record carries valid_from / valid_to and recorded_from / recorded_to. Answer 'what did we know about Acme on April 15, viewed from May 1?' Single-axis systems can't.

PASETO v4 authentication

Signed-token identity on every call. No plaintext API keys. Tokens carry capabilities; capabilities cascade through deployment → tenant → scope → actor; every denial cites the tier + capability.

POST /v1/erasures

Reference-counted GDPR erasure. Events with cross-scope references are redacted; events without references are deleted from the WAL. Preview → execute → status → cancel.

Lifecycle stream

Server-sent events for every async stage: captured, extracted, indexed, consolidated, forgotten. Real-time observability for UIs that want to show 'I just learned that'.

Five importers + 16 connectors

Native import from Mem0, Zep, Letta, OpenAI memory export, and generic JSONL. Live connectors for Slack, GitHub, GitLab, Jira, Linear, Confluence, Notion, PagerDuty, Discord, Microsoft Teams, Google Workspace, Salesforce, HubSpot, Zendesk, Intercom, ServiceNow.

Eight weeks from first commit

First commit landed on March 16, 2026. V1 ships today, May 16. In eight weeks:

498

Commits

184

In the last 30 days

In the last 14 days

+17.4

Points on LongMemEval

Most startups in this space are pre-product or pre-benchmark. We're eight weeks in and have already won the public benchmarks twice. Architecture is the multiplier. When the foundation is right, adding a feature isn't a project — it's a commit.

What's next: V2 — the brain

V1 ships the four stages of the memory cycle that operate continuously: Capture, Extract, Reconcile, Forget. V2 is about the fifth — Consolidate. The "sleep" stage where the brain shows up. Where raw experience becomes understanding.

V2's targets: 96–97% on LongMemEval-S, 92–94% on LoCoMo, and p50 end-to-end answer latency under 4 seconds. The synthesizer is already in flight as a beta endpoint — call POST /v1/understanding/synthesize to trigger a consolidation pass over a scope today, and the concepts land in GET /v1/understanding.

Even the primitive version of the "sleep" stage is unique in the market. No competitor builds it. That's what makes CortexDB feel like a brain instead of a log — and the gap widens with every cycle.

Try CortexDB

One unauthenticated POST gives you a PASETO bearer, a default scope, and a 30-day token — no email, no card.

curl -X POST https://api-v1.cortexdb.ai/v1/auth/signup -d '{}'

One-click Try CortexDB Python Quickstart The five layers

The bet

AI is having its internet moment. Reasoning got cheap. Knowledge retrieval is becoming infrastructure. What's left — and what every operator and every end-user actually wants — is an AI that knows them, knows their business, learns from every interaction, and gets sharper because of it.

That AI doesn't exist yet. To exist, it needs an experience layer underneath: durable, queryable, governed, explainable. The third stack that's been missing.

We're building it. The first version ships today. The next version ships in weeks. We'd love to have you on for the ride.