← Blog
June 26, 2026 · 10 min

FERNme: agent memory that models the user, not the transcript — and does it with zero LLM writes

Most agent memory systems pay a language model to read the whole transcript and re-extract what it already knew. FERNme refuses to. It writes memory with arithmetic on a graph, calls no LLM on the write path, and keeps the prompt-facing card near 25 tokens for life.

FERNme — the Fuzzy-Edged Recall Network — describes itself in one line: "agent personalization memory that models the user, not the transcript." It is a v0.3 research preview under Apache 2.0, Python 3.10+, backed by SQLite out of the box or PostgreSQL 16 in production, and it ships three surfaces an agent operator cares about: a FastAPI REST API, an MCP server, and a glass-box web UI where the user can see and edit every belief the system holds about them. The design is opinionated in a way worth studying, because it inverts the assumption baked into most of the memory stack we covered in our memory architectures update.

Memory that models the user, not the transcript

The dominant pattern for agent memory is extraction: after each interaction, you hand the conversation to an LLM and ask it to summarize, deduplicate, and store the salient facts. It works, but it has two structural costs. The summary grows with the history, so the context you inject keeps inflating; and every write burns model tokens, roughly two LLM calls per interaction in extraction-based systems.

FERNme's bet is that personalization does not need a narrative of what happened — it needs a model of who the user is. A user who books aisle seats, avoids spicy food, and shops for their kids does not need that re-derived from a transcript every session. It needs to be a small, stable, editable profile. So FERNme stores the profile directly and never stores the transcript at all.

Arithmetic, not extraction

The write path is the interesting part. When an event arrives — a click, a purchase, a stated preference — FERNme maps it through a controlled vocabulary of namespaced tags (pref:, topic:, goal:, context:) and strengthens the edges between the nodes that co-occurred. That is a Hebbian update: nodes that fire together wire together. Edges carry fuzzy weights from 0 to 9, the strengthening saturates so nothing runs away, and an ACT-R-style decay lets stale preferences fade.

Critically, all of that is integer arithmetic on a graph. As the project puts it: "memory updates are arithmetic on a graph — 0 LLM calls per interaction vs. ~2 for extraction-based memory." The default pure mode never touches a model on the hot path at all; two experimental modes, gated and offline, allow an optional LLM tagger for novel free-text, but always off the write path. The result is a memory layer whose write latency and cost are decoupled from model pricing entirely.

# Conceptual: an event becomes edge updates, not a stored sentence
event: user buys a child-sized rain jacket
 strengthen edge(user  "topic:kids")
 strengthen edge(user  "pref:waterproof")
# 0 LLM calls. Just saturating Hebbian writes with decay.

The 25-token card

What the agent actually receives at inference time is a "card." FERNme reports it stays at 24.9 ± 0.5 tokens whether it is the user's first visit or their fifth year — flat, by construction. It manages that with a population prior: the card stores only the user's deviations from what an average user would prefer, so anything obvious and shared reads through to the prior and never costs a token. The project measures the full-history baseline growing to 77× larger by 120 interactions; the FERNme card does not grow.

Retrieval uses spreading activation across the fuzzy graph: an incoming context activates nearby nodes, and the most strongly activated neighbors assemble into the token-minimal card. New users are not cold — the population prior gives a crowd-pattern head start that the project measures as +0.06 precision@5 on the first three turns, with k-anonymity and differential privacy so no individual leaks into the prior.

Supernode: a profile the user owns

The piece that makes this more than a cache is the supernode. When a user signs in across different sites, FERNme assembles their per-site memories into one profile keyed to an opaque person ID derived from a verified token. The framing is explicit: "sign in across sites, your memories assemble like Lego into one profile you control, default-deny, sensitive data walled off." Sites cannot read each other's data without consent, sensitive categories are excluded from cross-site sharing, and the user gets REST endpoints for the governance that implies — /edit, /export, /delete, plus a forget_everywhere that wipes the profile and unlearns it from the population prior. An append-only event log with a tamper-evident HMAC chain backs the ownership claim.

This is the inversion: instead of each site building a private, opaque profile of the user, the user owns one transparent profile and lends scoped views of it to the sites and agents they choose.

Why deterministic writes resist injection

There is a security dividend that falls out of the architecture for free. Because writes are arithmetic and not LLM extraction, malicious text on a page or in a user message cannot be "talked into" becoming a stored belief. The project states it plainly — "injected instructions never enter memory" — and treats event tags as untrusted, applying injection-pattern dropping and value caps in a safety layer. A memory that an attacker cannot write to with prose closes one of the nastier holes in the agent threat model: persistent memory poisoning, where one bad interaction corrupts every future session.

The numbers — and the caveats

FERNme publishes a benchmark suite, and the headline figures are striking. On drift detection — noticing that a user's preferences changed — it reports 0.72 recall versus 0.13 for a frequency counter. Context precision@3 lands at 0.62 versus 0.51 for the baseline. On the cost/quality Pareto per 1,000 interactions, pure mode reports 0.52 quality at $0.008, which it frames as 122× cheaper than Mem0; a gated mode reaches 0.66 quality at $0.023 (42× cheaper); an offline mode 0.73 at $0.104 (9× cheaper). A simulated storefront pilot reports a +16% relative conversion lift. The repository ships 119 passing tests and reproducible evaluation scripts.

Read these as claims, not verdicts. FERNme is explicit that this is a v0.3 research preview and that the benchmarks run on synthetic and LLM-authored data, with a real-human pilot still pending. The mechanism is validated; the lift is simulated. Treat the figures as a hypothesis worth testing on your own traffic, not as production-proven results.

What it means for LLM4Agents

A flat-token, LLM-free memory layer is directly aligned with how LLM4Agents bills. On a gateway where an agent pays per call in stablecoins over x402, two LLM extraction calls per interaction are not just latency — they are line items on every turn. A memory that writes for free and injects a 25-token card instead of a growing summary attacks the exact cost curve an operator watches, the one we sized in our fleet economics piece.

It also fits the wiring. FERNme already ships an MCP server, so an agent running against our gateway could mount it as a tool surface and read or write user memory through the same protocol it uses for everything else — the model we walked through in the MCP deep dive. And its determinism is a different answer to the same question our Graphiti and Mem0 comparison asked: how do you give an agent durable memory without the cost and the injection surface of LLM-mediated writes.

Staying on the frontier

Reading FERNme as a signal rather than a single project, three moves keep an agent-payments gateway ahead of where memory is going:

FERNme is early, and its boldest numbers are still synthetic. But the idea underneath it — that an agent should carry a small, owned, tamper-resistant model of the user instead of an ever-growing transcript — is the right shape for memory in a world where every write costs money and every input is a potential attack.

Give your agents memory without the per-write tax

Run any MCP-native memory layer behind one OpenAI-compatible endpoint, billed per call in stablecoins.

Register an agent