subagentcontext

.com context management
8 concepts, live from D1

Concepts

Every real context-management concept catalogued here, grouped by category unless filtered.

Context Windows

conceptdescription
What a context window is The context window is the total amount of text (measured in tokens) a model can consider at once for a given request — the system prompt, conversation history, tool definitions, tool results, and any attached documents all share this one budget.
Why context windows are finite Context windows are finite because attention computation over a transformer scales with sequence length, and serving very long contexts costs more compute, memory, and latency per request.
Managing long conversations within a window Long-running agent sessions (like a Cowork session running for many turns) accumulate tool calls, tool results, and conversation turns that can approach the context window's limit well before the conversation is actually finished.
Tokens are not the same as words Context window and prompt-caching limits are measured in tokens, not words or characters — a token is typically a sub-word unit, so the same block of English text might be roughly 1.3–1.5x as many tokens as words, and code, non-English text, or unusual formatting can tokenize less efficiently.

Prompt Caching

conceptdescription
What prompt caching is Prompt caching lets a client mark a prefix of a prompt (e.g. a long system prompt, a set of tool definitions, or a large shared document) as cacheable, so that repeated requests reusing that same prefix are billed and processed more cheaply and with lower latency than reprocessing it from scratch each time.
When prompt caching is most useful Prompt caching pays off most when the same large prefix (system prompt, tool schema, long reference document, or few-shot examples) is reused across many requests in a short time window, such as a multi-turn agent session or a batch of requests over the same document.

Context Editing / Compaction

conceptdescription
Trade-offs of aggressive context editing Summarizing or dropping older context saves budget but risks losing detail the model might need later — a fact mentioned once early in a session and then discarded can resurface as a gap several turns later if it was compacted away too aggressively.
What context editing / compaction is Context editing (sometimes called compaction) is the general practice of programmatically trimming, summarizing, or removing older or less-relevant content from a conversation's context so that a long-running session can continue without exceeding the context window.

Machine-readable version: GET /api/context-concepts