What prompt caching is
description
Prompt caching lets a client mark a prefix of a prompt (e.g. a long system prompt, a set of tool definitions, or a large shared document) as cacheable, so that repeated requests reusing that same prefix are billed and processed more cheaply and with lower latency than reprocessing it from scratch each time.
how it works
On a cache hit, the model does not need to reprocess the cached prefix's tokens the same way it would a fully fresh prompt — the provider serves the cached computation, and only the new (non-cached) portion of the prompt is processed at full cost/latency.
source note
Grounded in docs/docs/platform.claude.com/docs/en/build-with-claude/prompt-caching.md, mirrored in this repo per CLAUDE.md.
provenance
created 2026-07-02 08:27:03 · JSON