Memory is not context: how agents remember without lying

“Give the agent memory” sounds like one feature. It’s two, and treating them as one is how you get an agent that cheerfully tells a user something that used to be true.

The two things are context — what the model sees at inference time, assembled fresh every turn — and memory — a persisted, searchable record of things that happened in past conversations. They feel similar. They are governed by completely different rules, and the discipline is keeping them apart.

The line that matters

Here’s the distinction I enforce: anything that has a current source of truth is context, never memory.

The user’s subscription status, how many children are on their account, which devices are connected — all of that lives in a database and can change at any moment. The agent should look it up, fresh, every time it needs it. If you let any of that leak into long-term memory, you’ve created a landmine: the agent “remembers” you had three children, recalls it months later with total confidence, and is wrong because you added a fourth in the meantime.

Memory is for a narrower, more durable thing: facts the user stated that have no other home. “My oldest is on the autism spectrum and I prefer balanced framing.” “We’re trying to cut down screen time before bed.” Nobody else stores those. They don’t go stale in the way live state does. They’re exactly what you want the agent to recall, weeks later, when it’s relevant.

Get this line wrong and the most common symptom follows immediately: the agent asserting stale platform state as if it were a cherished memory. Get it right and memory becomes purely additive value with no downside.

Short-term: sessions with boundaries

Before long-term memory, you need sessions — and the mistake here is the opposite of bloat. The naive approach is one giant ever-growing session per user. It seems convenient until the conversation history blows past the context window and you’re paying to reprocess months of chatter on every turn.

The pattern that works is inactivity-based sessioning. A session is a continuous block of interaction; after a couple of hours of silence, the next message rolls a fresh session. This keeps each session small enough to load fully into context, meaningful as a unit, and — importantly — it gives you a clean boundary at which to extract long-term memories. When a session closes, you distill its durable facts into memory. The transcript itself doesn’t come back; only the distilled facts do.

That boundary is doing real work. Without it, “load the conversation” and “remember the important parts” have no natural seam between them.

Long-term: own the pipeline

When it came to the long-term store, the build-vs-buy options sorted into a clear ranking:

Fully managed memory services are the least plumbing, but they tie a core part of your product to one vendor’s runtime, with little control over schema, ranking, or export. Too much lock-in for something this central.
Commercial memory SaaS integrates fast but puts your most sensitive data behind someone else’s paywall with no self-hosting path.
General agent frameworks give you building blocks but not a backend — you still make every storage and retrieval decision, now with an extra framework in the mix.
Roll it yourself on boring infrastructure — Postgres with a vector extension for semantic search, a standard embedding model, and an LLM pass to extract facts from closed sessions.

We went with the last one, and I’d do it again for anything where memory is core rather than a nice-to-have. A vector column in a database you already run is transparent, queryable, easy to migrate, and free of lock-in beyond primitives you’re already committed to. The cost is that you own the pipeline — schema, embedding, retrieval, decay — but for a core capability, owning it is the point, not the price.

Retrieve on intent, not on reflex

Last rule: memory is pulled on demand, when the user is clearly recalling something (“what did I tell you about…”, “do you remember…”), not stuffed into every prompt. Pre-loading memory recreates the exact bloat-and-staleness problem you built sessions to avoid. The agent should reach for memory the way a person does — deliberately, when the conversation calls for it — and otherwise leave it on the shelf.

Two features, one hard line between them, and a bias toward looking things up over remembering them. That’s most of what separates an agent that feels reliable from one that’s confidently, fluently wrong.