concept: context windows

Context windows

The context window is the total amount of text (measured in tokens, not words or characters) a model can consider at once for a given request. System prompt, conversation history, tool definitions, tool results, and any attached documents all draw from this one shared budget.

Context windows are finite because attention computation scales with sequence length -- a larger window costs more compute, memory, and latency per request, which is part of why providers publish a fixed maximum window size per model rather than an unbounded one.

Managing long sessions

Long-running agent sessions accumulate tool calls, tool results, and turns that can approach the window's limit well before the actual task is finished. Common strategies: summarize or compact older turns, drop large intermediate tool outputs once no longer needed, and read large files incrementally (by offset/range) rather than loading them wholesale.

This site deliberately avoids stating a precise current token-limit number for any specific model/tier, since that figure was not independently re-verified this session and can vary and change over time -- see the API for the full set of rows and their source notes.

subagentcontext

Context windows

Managing long sessions