A public map for agent knowledge systems

Your agent doesn't need memory. It needs a knowledge architecture.

The AI memory conversation is asking the wrong question. The issue is not whether agents can store more. It is whether they can produce, curate, store, assemble and govern knowledge well enough to use it safely.

See the model Start with failures

oral memorywritinglibrariessciencesearchagent knowledge architecture

MisframeHow do we make agents remember more?↓Better questionWhat knowledge infrastructure should agents live inside?

The diagnosis

Recall is not the same as knowing.

A system can store everything and still fail to know what is true, current, private, authoritative, useful, cheap enough to retrieve, or safe to assemble into context.

This site reviews tools by the knowledge job they actually perform, not by the word “memory” in a launch post.

The knowledge stack

Five jobs every agent knowledge system must do.

Not five separate taxonomies. One operating model: pick a layer, see what it is responsible for, what can go wrong, and what a review should test.

Use: Activation

Knowledge only matters when assembled into task context. The hard question is not whether something exists, but whether it should enter this prompt, tool call, job, or human message now.

Example questionWhy didn't the agent notice my calendar changed before planning the day?

Activation routing

Select knowledge by task, thread, project, person, account and risk level.

Failure mode: The agent either misses context or leaks unrelated context into every task.

Retrieval policy

Rank candidates by relevance, authority, freshness, privacy and cost.

Failure mode: Top-k semantic search decides policy by accident.

Context budgeting

Allocate scarce prompt space across instructions, facts, evidence and working memory.

Failure mode: Useful evidence is crowded out by stale summaries.

Attribution

Show which source, provider, job or artifact supplied each claim.

Failure mode: The agent sounds confident but cannot trace why a fact entered the run.

Every review also scores:ScopeVolatilityAuthorityLifecycleResource cost

Evidence-backed failures

The homepage examples should come from real pain.

Initial research across GitHub issues, HN, Stack Overflow, forums, vendor docs and papers points to these first clusters. The ranking is provisional and should keep updating.

CurationPain signal: very high

Stale or contradictory memory

Append-only memories preserve old truths beside new ones. The agent remembers the change, but not which fact superseded which.

The memory says I like coffee and that I stopped liking coffee. Which one wins?

ActivationPain signal: very high

Retrieval mismatch

The right knowledge exists, but recall returns zero results, wrong chunks, equal scores, or missing metadata.

846 stored messages, but conversation search returns nothing.

CurationPain signal: high

Junk memory accumulation

Automatic memory extraction can fill the system with duplicate, hallucinated, low-salience facts that crowd out useful context.

10,134 memories. 224 useful. The rest is a junk drawer with embeddings.

GovernancePain signal: high severity

Scope, privacy and poisoning

Persistent memory widens the blast radius. A bad write, wrong tenant boundary, or injected instruction can influence future runs.

A bad write today becomes trusted context tomorrow.

CurationPain signal: high

Promotion without evidence

A system can promote a chat note into durable knowledge before repeated use, verification, or user feedback proves it belongs there.

A one-off observation becomes a profile, policy, or project state card without a promotion ledger.

ActivationPain signal: high

Invisible activation

The system may retrieve useful facts but fail to show why those facts entered the run, which source outranked others, or whether a provider supplied supporting evidence only.

The right answer appears, but no one can audit whether it came from a canonical artifact, memory provider, or stale summary.

Tool review queue

A provisional queue of memory tools and agent knowledge systems.

Layer coverage matrix

Rows are the knowledge-stack layers. Columns are selected tools. Cells are provisional hypotheses, not completed reviews, until each tool has a full evidence packet.

LayerLetta / MemGPTMem0Zep / GraphitiMicrosoft GraphRAGLangGraph / LangMemLlamaIndex

Production46 58 55 64 58 82

Curation52 62 86 84 54 58

Storage68 82 80 72 60 70

Activation86 70 68 62 82 62

Governance58 56 60 54 50 44

Tool76

Letta / MemGPT

Strong agent-state framing. Good fit when you need explicit agent memory blocks, persona/state management and inspectable long-running agents.