grounding: platform.claude.com/docs/en/build-with-claude/prompt-caching

Prompt caching

Prompt caching lets a client mark a prefix of a prompt (a long system prompt, a set of tool definitions, or a large shared document) as cacheable, so repeated requests reusing that same prefix are billed and processed more cheaply and with lower latency than reprocessing it from scratch each time.

How it helps

On a cache hit, the model does not need to reprocess the cached prefix's tokens the way it would a fully fresh prompt -- only the new, non-cached portion of the prompt is processed at full cost/latency. This pays off most when the same large prefix is reused across many requests in a short time window, such as a multi-turn agent session or a batch of requests over the same document.

Expiry

Each cached prefix has a limited time-to-live. If requests reusing that prefix keep arriving before it expires, subsequent requests keep getting the caching benefit; if too much time passes between requests, the cache entry expires and the next request pays full cost to reprocess and re-cache it.

Source: platform.claude.com/docs/en/build-with-claude/prompt-caching, mirrored locally in this repo per CLAUDE.md.

subagentcontext

Prompt caching

How it helps

Expiry