← Blog
June 6, 2026 · 13 min

Agent memory 2.0: Titans, MemOS, and the cross-session continuity gap nobody has closed yet

Four weeks since our Graphiti / Mem0 post and the memory layer of the agentic stack moved more than any other in the same window. Titans, the Google neural-memory architecture, scaled past 7B parameters with results that quietly displaced long-context Transformers on the hardest long-horizon benchmarks. MemOS — the "operating system for LLM memory" — published peer-reviewed benchmarks beating every prior baseline by 60-160% on LongMemEval. Letta (formerly MemGPT) crossed 50,000 production deployments. And the architectural gap we flagged in the stack synthesis — cross-session memory continuity at the protocol level — is still open. This post is the update for operators currently on Graphiti, Mem0, Letta or a custom stack, trying to decide whether to rotate.

Where we were a month ago — quick recap

The original memory post argued that two production-grade approaches dominated the open-source space in early 2026. Graphiti, Zep's bi-temporal knowledge graph that records facts plus the time at which they became true and the time they were learned, supporting both point-in-time queries ("what did the agent know on March 14") and consistency tracking. Mem0, the simpler approach that extracts memories from raw conversations into a vector store with optional relation edges, ranking memories by recency and relevance at retrieval time.

Both approaches were solving the same surface problem — get useful prior context into the agent's working memory at the right moment — and both were doing it well enough that the gap from "no memory layer" to "Graphiti or Mem0" was the largest single quality jump an operator could make. The question we left open: what does the next architectural layer look like, and is it close enough to production for an operator to bet on.

Four weeks later the answer is more concrete than we expected.

Titans — neural memory that learns at test time

Titans was first published by Google Research in late 2024 and has been quietly scaling through 2025 and early 2026. The architecture solves a different problem than Graphiti or Mem0: instead of storing memories externally and retrieving them with vector search, Titans gives the model a neural memory module that learns to memorize during inference. The memory module is itself a small neural network whose parameters update at test time based on how surprising each new token is, and whose outputs are conditioned on a long-running memory state.

The architecture defines three integration variants:

Titans memory integration patterns:

  Memory as Context (MAC)
    The memory module produces a "memory hint" token sequence that
    is prepended to the attention context. Attention sees both the
    short-context window and the memory summary.

  Memory as Gate (MAG)
    The memory module's output gates the standard attention output —
    memory and attention contribute jointly via a learned gate.

  Memory as Layer (MAL)
    The memory module is a layer in the stack — sequential rather
    than parallel composition with attention.

The benchmark numbers that matter for operators are on the long-horizon side. BABILong, the benchmark for needle-in-haystack-style reasoning across distances of millions of tokens, is where Titans's MAC variant clearly outperformed both long-context Transformers (Gemini 1.5 Flash-class) and the leading state-space models (Mamba-2 variants). The gap widens at the longest distances — past the 4M-token mark, Titans MAC maintains 80%+ accuracy where Transformer baselines drop to coin-flip.

What this means for operators in practice. Titans is not (yet) something you drop into your agent stack. It is a model-architecture-level change that has to ship inside a foundation model. The expectation is that the Gemini 3.5 family will be the first commercial model to ship Titans-style memory as a core capability, with an announcement likely in Q3-Q4 2026. Operators do not adopt Titans; they adopt the model that has Titans.

The implication: the long-context tax that has been the operational reality of agent memory for two years — the cost spike from contexts above 200K tokens, the degradation from DELEGATE-52 — has a real architectural answer arriving in the next two quarters. The interim path remains external memory (Graphiti, Mem0, MemOS) on top of standard frontier models.

MemOS — the memory operating system

If Titans is the foundation-model-level answer, MemOS is the application-level one. Published in April 2026 by a research consortium spanning several Chinese and US universities, MemOS reframes agent memory as an operating-system problem: three distinct memory types with different durability and access costs, scheduled and migrated across types by a memory manager that decides what to promote, demote, evict, or rematerialize.

MemOS three-tier memory architecture:

  Plaintext memory
    Natural-language facts, conversation history, structured docs.
    Cheap to store, expensive to use (must be retrieved + read
    into context for every relevant invocation).

  Activation memory (KV cache)
    Cached key-value states from prior model invocations.
    Medium cost to store, very cheap to use (no re-encoding).
    Lossy: the activation only represents what the model "thought"
    at write time.

  Parameter memory
    Knowledge encoded as model-parameter deltas (LoRA-style adapters).
    Expensive to write (requires fine-tune-shaped update), very
    cheap to use (no retrieval, no context bloat).
    Permanent in the sense that the model "knows" the fact.

The MemOS contribution is the scheduler. Operators do not pick a memory tier per fact; the scheduler does. A new fact arrives as plaintext, gets promoted to activation if used frequently within a session, gets promoted to parameter memory if reinforced across many sessions or tagged as core knowledge by the operator. Evictions and demotions work in reverse. The result is a memory system where the most-used knowledge lives in the cheapest-to-access tier without operator intervention.

The benchmark numbers are striking. On LongMemEval — the standard long-horizon memory benchmark — MemOS reported 60-160% improvements over the strongest prior baseline (which was OpenAI's built-in memory at the time of publication). The gain came almost entirely from the parameter-memory tier: once knowledge had been promoted into the model's parameter space, recall was effectively free and persisted across context windows that broke the plaintext-only baselines.

What this means for operators in practice. MemOS is closer to deployable than Titans. The reference implementation is open source, and several startups (the most active is a Letta-affiliated project called Persistent) are wrapping it into operator-friendly SDKs. The cost equation is non-trivial: parameter memory updates cost real money (you are running fine-tune-shaped updates per agent) but the per-query inference cost drops substantially because retrieval and context bloat both decrease. For an agent serving the same customer for many sessions, the parameter-memory amortization is favorable; for a high-churn one-shot agent, plaintext-only remains the right choice.

The cross-session continuity gap nobody has closed

We flagged this gap in the stack synthesis post and the field has not closed it in the month since. The shape of the problem: an agent that does great work for a counterparty in session N has no standardized, protocol-level way to bring that learning forward into session N+1. The agent's memory state is opaque to other agents, to other platforms, to the counterparty's own systems.

The operator-level workarounds are real but partial. Mem0 and Graphiti both ship export formats; an operator can dump a customer's memory state and re-import it elsewhere, but no other system actually consumes the format natively. MemOS's parameter-memory deltas are LoRA adapters, which are theoretically portable but practically tied to the specific base model they were trained on. Letta's structured memory is JSON-shaped and reads cleanly but does not interoperate with non-Letta agents.

What is missing at the protocol level: a portable memory primitive that an agent can hand off, that another agent (or another platform) can consume, that the counterparty can audit, and that the cryptographic discipline of the rest of the stack (signed Mandates, validation receipts, attested identity) can apply to. None of the five protocol layers covers this; the working groups for MCP, A2A and AP2 are not actively scoping it; the closest research direction is the W3C Verifiable Credentials community's exploration of "knowledge credentials," which is still pre-production.

Our prediction: the gap will be closed by an A2A v1.x or v2 extension defining a structured memory-handoff primitive, sometime in mid-to-late 2027. Until then, operators ship the workaround patterns and accept the lock-in cost of whichever memory system they pick.

The ERC-8004 binding pattern

While the full protocol-level solution waits, one practical pattern has emerged that ties agent memory to the on-chain identity layer. The pattern: every meaningful update to an agent's memory state generates a content hash; the hash is posted as part of the agent's ERC-8004 Validation Registry entry; the agent's reputation includes a "memory consistency score" derived from how often its memory hashes diverge from declared trajectories.

Memory state binding pattern:

  1. Agent processes a new fact / receives feedback / closes a task
  2. Memory layer computes Merkle-style hash over relevant updates
  3. Hash + minimal metadata posted to ERC-8004 Validation Registry
     under the agent's identity token
  4. Counterparty verifies the agent's claimed memory state matches
     the on-chain attested state
  5. Reputation score incorporates the consistency signal across
     many such attestations

The binding does not transport the memory itself across systems — the content stays in the agent's private store. It transports the proof that the agent's behavior is consistent with the memory state it claims to have. A counterparty hiring the agent for the first time can read the on-chain consistency score without ever seeing the raw memory; an agent that quietly resets its memory between sessions has a hash trail that exposes the reset.

We are running this pattern in Agent Builder as of mid-May 2026 for every agent with ERC-8004 binding enabled. The cost per attestation is ~$0.10 on a Base-class L2; the rate is per-session (not per-fact), so the overhead is bounded. The benefit is that "this agent's memory has been consistent across 200 sessions" becomes a queryable signal that counterparties can use without trusting the operator or the platform.

What the operator should do today

Practical guidance for the four most common starting points.

If you are on Graphiti. Stay. Graphiti's bi-temporal model is still the strongest fit for workloads where "what did the agent know at time T" matters — legal, audit, compliance, research workflows. The MemOS work does not replace it; it complements it for the parameter-memory tier. Expect a Graphiti + MemOS integration to appear from Zep in Q3 2026.

If you are on Mem0. Stay through Q3 2026. The Mem0 approach (extraction + vector store + relations) is operationally the lightest and the team has been quietly adding parameter-memory hooks. The MemOS-style scheduler is harder to retrofit into Mem0's architecture than into Letta's, so we expect Mem0 to evolve more incrementally. The gap between Mem0 and MemOS will widen for memory-heavy workloads through year end; if you have ten or more sessions per customer, evaluate the migration.

If you are on Letta (formerly MemGPT). You are well-positioned. Letta's structured-memory architecture maps cleanly to MemOS's three tiers, and the Letta team has been the most public about adopting MemOS-style scheduling. The migration from Letta v0.x to Letta v1.0 (expected Q3 2026) is the moment to revisit; v1.0 is supposed to ship native MemOS support.

If you have a custom stack. The honest assessment: you are probably underinvested in this layer. Custom memory systems that worked in 2025 are starting to lag the open-source state of the art in a way that compounds week over week. The migration path: pick one of Graphiti, Mem0, or Letta based on the workload shape (audit / lightweight extraction / structured memory), migrate over a quarter, plan for a second migration to a MemOS-aware stack in 2027. Owning the memory layer end-to-end is a tax that almost no operator should pay anymore.

What to watch through Q4 2026

Three concrete things on the horizon that will affect how operators think about memory.

Gemini 3.5 with Titans-class memory. Expected announcement Q3-Q4 2026. If it ships as advertised, the cost of long-context workloads drops materially and operators currently using external memory primarily as a long-context workaround can simplify. Operators using external memory as structured-knowledge storage (the Graphiti / Mem0 use case) will not see meaningful change.

MemOS commercial wrappers. Persistent, the Letta-affiliated project, plus two stealth-mode startups, are racing to be the first production-grade MemOS-as-a-service. The first one that ships with credible operator tooling (eval suite integration, drift detection on parameter memory, audit trails) wins disproportionate market share. Watch for the announcement in Q3.

A2A v1.x or v2 memory-handoff primitive. The Linux Foundation working group running A2A has not committed to this scope yet but has acknowledged the gap. Our read: there is roughly a 50/50 chance of a draft proposal by end of 2026. If it lands, the memory layer becomes legitimately portable for the first time and the operator's lock-in cost drops to near zero.

The architectural shape the next eighteen months will produce

If we are right about Titans-class memory shipping in the foundation models, MemOS-class scheduling shipping in the operator-grade memory platforms, and A2A-class portability shipping at the protocol level, the agent memory layer in late 2027 looks materially different from today. Working memory lives in the model's neural memory module (free, fast, no retrieval). Episodic memory lives in MemOS-managed tiers (auto-scheduled across plaintext / activation / parameter). Cross-session continuity is a first-class protocol primitive (portable, verifiable, anchored). The operator's job at that point is configuration and supervision; the underlying architecture is somebody else's problem.

For the next eighteen months, the operator's job is harder. Pick the memory layer that fits your workload, accept the migration cost you will pay once the dust settles, and bind the memory state to ERC-8004 attestations now so that whatever happens at the protocol level, your agent's track record is durable. The investment in attestation infrastructure compounds even if the underlying memory system changes underneath it.

Closing

Memory was the slowest-moving layer of the stack at the start of 2025 and is the fastest-moving at the start of June 2026. The shape of the next eighteen months is legible enough to plan around: Titans-class memory in the models, MemOS-class scheduling in the tooling, portable memory handoff at the protocol level. The shape of the next four weeks is not — we expect at least one more architecture-changing paper to land before our next memory post, and the field at this velocity will likely surprise us.

For the operator deciding today, the practical answer is to stay on the production-grade stack you have (Graphiti, Mem0, Letta) and start binding the memory state to ERC-8004 attestations. The benefit of binding compounds regardless of which architecture wins; the cost of binding is trivially small. The bet against the field is to own a custom memory stack from scratch; the bet with the field is to consume the open-source state of the art and run the platform-level concerns (eval, observability, attestation) on top of it.

The next post in this series moves from memory to the broader forward view: the twelve-month forecast for the agentic stack — which protocols will ship v1.0, which startups will consolidate, which attack vectors will emerge, and which categories of operator will compound the fastest. See you there.