8 concepts, live from D1
Concepts
Every real context-management concept catalogued here, grouped by category unless filtered.
Context Windows
| concept | description |
| What a context window is |
The context window is the total amount of text (measured in tokens) a model can consider at once for a given request — the system prompt, conversation history, tool definitions, tool results, and any attached documents all share this one budget. |
| Why context windows are finite |
Context windows are finite because attention computation over a transformer scales with sequence length, and serving very long contexts costs more compute, memory, and latency per request. |
| Managing long conversations within a window |
Long-running agent sessions (like a Cowork session running for many turns) accumulate tool calls, tool results, and conversation turns that can approach the context window's limit well before the conversation is actually finished. |
| Tokens are not the same as words |
Context window and prompt-caching limits are measured in tokens, not words or characters — a token is typically a sub-word unit, so the same block of English text might be roughly 1.3–1.5x as many tokens as words, and code, non-English text, or unusual formatting can tokenize less efficiently. |
Prompt Caching
| concept | description |
| What prompt caching is |
Prompt caching lets a client mark a prefix of a prompt (e.g. a long system prompt, a set of tool definitions, or a large shared document) as cacheable, so that repeated requests reusing that same prefix are billed and processed more cheaply and with lower latency than reprocessing it from scratch each time. |
| When prompt caching is most useful |
Prompt caching pays off most when the same large prefix (system prompt, tool schema, long reference document, or few-shot examples) is reused across many requests in a short time window, such as a multi-turn agent session or a batch of requests over the same document. |
Context Editing / Compaction
| concept | description |
| Trade-offs of aggressive context editing |
Summarizing or dropping older context saves budget but risks losing detail the model might need later — a fact mentioned once early in a session and then discarded can resurface as a gap several turns later if it was compacted away too aggressively. |
| What context editing / compaction is |
Context editing (sometimes called compaction) is the general practice of programmatically trimming, summarizing, or removing older or less-relevant content from a conversation's context so that a long-running session can continue without exceeding the context window. |
Machine-readable version: GET /api/context-concepts