4 concepts, live from D1
Concepts
Every real context-management concept catalogued here, grouped by category unless filtered.
| concept | description |
|---|---|
| What a context window is | The context window is the total amount of text (measured in tokens) a model can consider at once for a given request — the system prompt, conversation history, tool definitions, tool results, and any attached documents all share this one budget. |
| Why context windows are finite | Context windows are finite because attention computation over a transformer scales with sequence length, and serving very long contexts costs more compute, memory, and latency per request. |
| Managing long conversations within a window | Long-running agent sessions (like a Cowork session running for many turns) accumulate tool calls, tool results, and conversation turns that can approach the context window's limit well before the conversation is actually finished. |
| Tokens are not the same as words | Context window and prompt-caching limits are measured in tokens, not words or characters — a token is typically a sub-word unit, so the same block of English text might be roughly 1.3–1.5x as many tokens as words, and code, non-English text, or unusual formatting can tokenize less efficiently. |
Machine-readable version: GET /api/context-concepts?category=context_window