Tool profile · provisional profile

Letta / MemGPT

Strong agent-state framing. Good fit when you need explicit agent memory blocks, persona/state management and inspectable long-running agents.

Provisional fit

76/100

Best for: Agent-state experiments where memory must be visible and editable rather than implicit in prompts.

Avoid if: you need a fully governed, citation-complete knowledge architecture without adding policy, evidence capture, and review workflow around the tool.

Caution: Still needs external discipline for provenance, deletion, lifecycle and cross-user governance.

Model signature: Context Assembly primary · agent scope · Tool

Layer coverage

Where this tool fits.

This is not a completed review. It is a provisional profile from public positioning plus known failure-mode mapping. Hands-on benchmarks, source snapshots, and citation-bound claims are still required before stronger conclusions.

Production
Curation
Storage
Context Assembly
Governance
Review rule: a tool does not get credit for a layer unless it exposes inspectable behavior, not just a marketing claim.

Evidence notes

What the provisional profile has applied so far.

The research pass mapped Letta primarily to context assembly: what gets assembled into the working context, when, and as what kind of state.

Failure modes to test: hidden stale state, unclear memory authority, and weak audit trail from source event to assembled context.

Profile depth is provisional: based on public positioning/docs and known memory-failure clusters, not a hands-on benchmark or citation-bound review yet.

Review packet

What a complete review must contain.

This page exposes the intended review structure. The current artifact is a profile, not a completed evidence-backed review.

Strengths

Agent-state experiments where memory must be visible and editable rather than implicit in prompts.

Limitations

Still needs external discipline for provenance, deletion, lifecycle and cross-user governance.

Dimension assessment

Scope, volatility, authority, lifecycle, resource economics, interoperability, and evidence quality must each get a rationale and citations before final scoring.

Open questions

  • What can be verified from docs, code, issues, benchmarks, and changelogs?
  • Where does the tool fail under stale, contradictory, private, or high-cost knowledge?
  • Which claims are vendor claims versus independently observed behavior?

Benchmark critique

No benchmark number is accepted as architectural evidence unless it says which layer it tests and what it misses: lifecycle, scope boundaries, authority, context cost, and governance.

Related systems

Related tools should be connected by evidence-backed edges: competes with, integrates with, implements concept, evaluated by, or has governance gap.

Update history

Provisional profile created. Stale-review detection, source snapshots, and changelog watching are required before this becomes a durable review.