A public map for agent knowledge systems

Your agent doesn't need memory. It needs a knowledge architecture.

The AI memory conversation is asking the wrong question. The issue is not whether agents can store more. It is whether they can produce, curate, store, assemble and govern knowledge well enough to use it safely.

oral memorywritinglibrariessciencesearchagent knowledge architecture
MisframeHow do we make agents remember more?Better questionWhat knowledge infrastructure should agents live inside?

The diagnosis

Recall is not the same as knowing.

A system can store everything and still fail to know what is true, current, private, authoritative, useful, cheap enough to retrieve, or safe to assemble into context.

This site reviews tools by the knowledge job they actually perform, not by the word “memory” in a launch post.

The knowledge stack

Five jobs every agent knowledge system must do.

Not five separate taxonomies. One operating model: pick a layer, see what it is responsible for, what can go wrong, and what a review should test.

Use: Context Assembly

Knowledge only matters when assembled into task context. The hard question is not whether something exists, but whether it should enter this prompt, tool call, job, or human message now.

Example questionWhy didn't the agent notice my calendar changed before planning the day?

Context routing

Select knowledge by task, thread, project, person, account and risk level.

Failure mode: The agent either misses context or leaks unrelated context into every task.

Retrieval policy

Rank candidates by relevance, authority, freshness, privacy and cost.

Failure mode: Top-k semantic search decides policy by accident.

Context budgeting

Allocate scarce prompt space across instructions, facts, evidence and working memory.

Failure mode: Useful evidence is crowded out by stale summaries.

Tool-mediated recall

Expose memory as inspectable tools rather than invisible model behavior.

Failure mode: The user cannot tell what knowledge was used or correct it.
Every review also scores:ScopeVolatilityAuthorityLifecycleResource cost

Evidence-backed failures

The homepage examples should come from real pain.

Initial research across GitHub issues, HN, Stack Overflow, forums, vendor docs and papers points to these first clusters. The ranking is provisional and should keep updating.

CurationPain signal: very high

Stale or contradictory memory

Append-only memories preserve old truths beside new ones. The agent remembers the change, but not which fact superseded which.

The memory says I like coffee and that I stopped liking coffee. Which one wins?
Context AssemblyPain signal: very high

Retrieval mismatch

The right knowledge exists, but recall returns zero results, wrong chunks, equal scores, or missing metadata.

846 stored messages, but conversation search returns nothing.
CurationPain signal: high

Junk memory accumulation

Automatic memory extraction can fill the system with duplicate, hallucinated, low-salience facts that crowd out useful context.

10,134 memories. 224 useful. The rest is a junk drawer with embeddings.
GovernancePain signal: high severity

Scope, privacy and poisoning

Persistent memory widens the blast radius. A bad write, wrong tenant boundary, or injected instruction can influence future runs.

A bad write today becomes trusted context tomorrow.

Tool review queue

A provisional queue of memory tools and agent knowledge systems.

Layer coverage matrix

Rows are the knowledge-stack layers. Columns are selected tools. Cells are provisional hypotheses, not completed reviews, until each tool has a full evidence packet.

LayerLetta / MemGPTMem0Zep / GraphitiMicrosoft GraphRAGLangGraph / LangMemLlamaIndex
Production465855645882
Curation526286845458
Context Assembly867068628262
Governance585660545044
Tool76

Letta / MemGPT

Strong agent-state framing. Good fit when you need explicit agent memory blocks, persona/state management and inspectable long-running agents.

Open provisional profile →
Context Assemblyagent
Tool70

Mem0

Practical memory layer with strong adoption signal. The core review question is whether it avoids junk accumulation, stale facts and weak overwrite semantics.

Open provisional profile →
Storagepersonal/team
Data framework64

LlamaIndex

Strong ingestion and indexing primitives. It helps produce retrievable knowledge objects, but governance and context assembly remain system responsibilities.

Open provisional profile →
Productioncorpus

Review target

Completed reviews should expose fit, not declare a universal winner.

Each tool page should bind claims to evidence, map layer coverage, score layer behavior where justified, show caveats, name failure modes, and keep benchmark numbers separate from architectural evidence. Papers and benchmarks support the review; they are not the first-class map item.

Production
Curation
Storage
Context Assembly
Governance
Benchmark note: reported recall numbers are not proof of governance, lifecycle, authority or domain-policy handling.
Missing the review workflow?See the review loop