Claude Agent SDK Memory Tool in 2026: Persistent Context That Survives /clear
How the Claude Agent SDK memory tool keeps context across sessions — file-based memory patterns, trade-offs vs prompt caching, and a working Python setup.
You’ve built a Claude-powered agent that works beautifully — until the user starts a new session and it forgets everything. Their preferences, the project context, the weird edge case you told it about yesterday: gone. You find yourself stuffing an ever-growing “context” string into every system prompt and watching your token bill climb.
The Claude Agent SDK memory tool solves exactly this problem. It gives the model a file-backed scratchpad it can read, write, and search across sessions, turning a stateless API into something that behaves like a long-running collaborator. This guide covers how it works in 2026, where it outperforms prompt-caching tricks, and a minimal working pattern.
TL;DR
- The memory tool in the Claude Agent SDK is a file-based persistence layer the model can call directly —
read_file,write_file,list_files, etc. - It’s not magic — you wire up the storage backend (local disk, S3, database). The SDK gives the model the tool interface; you decide what “a file” actually is.
- It complements, not replaces, prompt caching. Caching makes repeated context cheap; memory makes different context persist across sessions.
- Best fit: long-running coding agents, customer support bots with per-user history, research assistants, anything where “remember what we decided” is load-bearing.
Deep Dive: How the Memory Tool Actually Works
The Core Idea
The Claude Agent SDK exposes a memory tool via the standard tool-use protocol. From Claude’s perspective, memory is just another tool it can call — same mechanism as a web search or a shell command.
Under the hood, the SDK ships a default tool schema the model can invoke with operations like:
view— read the contents of a memory filecreate— write a new memory filestr_replace— edit part of a file in placeinsert— add content at a specific positiondelete— remove a filerename— move/rename a file
You register a backend that handles these operations. The backend can be anything — local filesystem, SQLite, Redis, Postgres, or an object store. The model neither knows nor cares.
Why This Is Different From “Just Putting It in the Prompt”
Stuffing past context into the system prompt has three known failure modes:
- Token cost grows linearly with history, even when most of it is irrelevant to the current turn
- Context windows fill up — the 1M-token window on Opus 4.7 sounds endless until you’re 40 sessions deep with verbose tool outputs
- The model can’t selectively forget — you can only trim on your side, which means guessing what it’ll need
The memory tool flips control: the model decides what to remember and when to look it up. It writes a note after resolving a tricky bug, then retrieves only the relevant file on the next session. You pay tokens for the lookup, not for carrying the entire history around.
Memory Tool vs Prompt Caching
This confuses a lot of developers, so let’s be blunt about it:
| Feature | Memory Tool | Prompt Caching |
|---|---|---|
| Purpose | Persistent state across sessions | Cheap repetition of the same context within a 5-min window |
| Lifetime | As long as your backend keeps the file | ~5 minutes TTL |
| Cost model | Tokens read/written on demand | ~90% discount on cached input tokens |
| Model control | Model decides what to save/read | You control what gets cached |
| Good for | User profiles, long-running agents, project memory | RAG context, system prompts, static docs |
They’re complementary. A well-designed agent uses caching for the expensive-to-send system prompt and memory for the dynamic, session-spanning user context. See our Claude API prompt caching guide for the caching side.
A Minimal Python Setup
Here’s the skeleton of a local-filesystem memory backend. The SDK handles the tool wiring; you implement the storage.
from anthropic import Anthropic
from pathlib import Path
MEMORY_ROOT = Path("./agent_memory")
MEMORY_ROOT.mkdir(exist_ok=True)
def handle_memory_tool(tool_input: dict) -> str:
command = tool_input["command"]
path = MEMORY_ROOT / tool_input["path"].lstrip("/")
if command == "view":
return path.read_text() if path.exists() else "File not found"
if command == "create":
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(tool_input["file_text"])
return f"Created {path.name}"
if command == "str_replace":
text = path.read_text()
new = text.replace(tool_input["old_str"], tool_input["new_str"], 1)
path.write_text(new)
return "Replaced"
# ... handle insert, delete, rename
return "Unknown command"
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=[{"type": "memory_20250818", "name": "memory"}],
messages=[{"role": "user", "content": "Remember I prefer TypeScript over Python."}],
)
On subsequent turns, you feed the tool results back into the conversation, and Claude will proactively read memory when a user says something that might be context-dependent.
What to Store (and What Not To)
The real engineering is in the taxonomy of what goes in memory. Dump everything and retrieval becomes noisy; store too little and you’re back to the same problem.
A pattern that works well in production:
user/profile.md— stable preferences, role, tech stackuser/feedback.md— corrections the user has made (“stop doing X”)project/{id}/decisions.md— key architectural choicesproject/{id}/open_questions.md— things left unresolved
Avoid storing debuggable/derivable state: current file contents, git log output, current branch. That belongs in tools the model calls fresh each time.
Pros & Cons
| Pros | Cons |
|---|---|
| Persistence across sessions without prompt bloat | You own the storage backend — no managed option out of the box |
| Model decides what’s worth remembering | Poor memory hygiene creates “noise memories” that degrade retrieval |
| Token-efficient for long-horizon agents | Extra round-trips for each memory read (latency) |
| Composes with prompt caching | Not free — reads and writes still cost tokens |
| Backend is pluggable (disk/S3/DB) | Multi-user systems need careful key scoping (user-id prefix, etc.) |
Who Should Use This
- Builders of long-running coding agents (à la Claude Code) where remembering user conventions across days/weeks matters
- Customer support / CRM bots that need per-user memory without re-fetching CRM state every turn
- Research or writing assistants where “what did we decide about X last week” is a common question
- Multi-agent systems where a supervisor agent coordinates by reading shared memory
Skip it if your use case is single-turn (classification, transformation, one-shot extraction) — you’ll just add latency and complexity.
FAQ
Can I use the memory tool without the Agent SDK?
Yes — the memory tool is a standard Claude API tool type. You can register it via the raw Messages API by including {"type": "memory_20250818", "name": "memory"} in the tools array. The Agent SDK just gives you ergonomic helpers and a reference filesystem backend.
How big should memory files be?
Aim for each file to be small and focused — a few hundred tokens per file. The model reads them whole, so a 50KB profile defeats the efficiency goal. Split by topic.
Does memory work with streaming?
Yes. Tool calls interleave with streaming output; when the model invokes view or create, you pause streaming, execute the backend call, and resume with the result. See our Claude structured output and streaming guide for streaming patterns.
What about privacy and user data?
Memory is only as secure as your backend. If you’re storing personal data, scope file paths by user ID, encrypt at rest, and add an explicit deletion endpoint. The model has no concept of “this is PII” — that’s your responsibility.
Does this replace vector databases for RAG?
No. Memory is for agent-authored state — notes the model decides to keep. RAG is for your corpus — documents you control the ingestion of. They solve different problems and often coexist. For vector-store selection see our pgvector vs Pinecone comparison.
Bottom Line
The Claude Agent SDK memory tool is the cleanest answer to the “my agent forgets everything” problem — but only if you treat memory hygiene as a first-class design concern. Combine it with prompt caching for cost, scope files narrowly for retrieval quality, and the model itself does most of the work.
Product recommendations are based on independent research and testing. We may earn a commission through affiliate links at no extra cost to you.