Context Window Composition
The context window is shared between input and output. Input includes: system prompt, conversation messages (user + assistant turns), tool definitions, and tool results. Output is the model's response. Larger context windows (200K tokens for Claude) accommodate more input but also cost more.
In agentic loops, the context grows with each iteration because each tool call and result is added to the conversation history. Without management, context exhaustion is inevitable for long-running agents.
Context Management Strategies
Key strategies include: conversation summarization (periodically compress the conversation history), sliding window (keep only recent N turns), selective pruning (remove irrelevant tool results), context-aware prompting (only include information relevant to the current step), and output-aware budgeting (reserve tokens for the model's response).
The best strategy depends on the application. Summarization preserves important information but loses detail. Sliding window is simple but may lose critical early context. Selective pruning is most flexible but requires understanding what is relevant.
Token Counting
Token counting is essential for context management. Use the token counting API or client library to measure conversation size before sending requests. This prevents context overflow errors and allows proactive management.
A common pattern is: before each API call, count the total tokens (system prompt + history + tools), compare against the model's limit minus a reserved buffer for output, and trigger context management (summarization, pruning) if the count exceeds the threshold.
Key Concept
Budget for Output, Not Just Input
When managing context, remember that the model needs room to generate its response. If you fill the context window with input, the model's response will be truncated. Always reserve a buffer for output tokens — typically 4K-8K tokens for complex responses, more for code generation. Context budget = max_tokens - system_prompt - tools - reserved_output.
Exam Traps
Ignoring tool definitions in context budget
Tool definitions consume input tokens. With many tools, this can be significant. The exam may test whether you account for tool tokens in context calculations.
Summarizing too aggressively
Over-summarization loses important details. The model may make incorrect decisions based on incomplete summaries. Balance compression with information retention.
Not reserving output buffer
Filling the entire context window with input leaves no room for the response. Always reserve tokens for model output.
Check Your Understanding
An agentic loop has been running for 50 iterations. The context is at 180K tokens (limit: 200K). The agent needs to continue working. What is the best approach?
Build Exercise
Implement Context Management
What you'll learn
- Count tokens in conversation context
- Implement conversation summarization
- Build a sliding window with important turn retention
- Test context management under load
Create a function that counts the total tokens in a conversation (system prompt + messages + tools). Test with conversations of various sizes.
WHY: Token counting is the foundation of context management — you cannot manage what you do not measure.
YOU SHOULD SEE: Accurate token counts for conversations of different sizes.
Implement a summarization function: when context exceeds 80% of the limit, summarize all but the last 5 messages into a single summary message.
WHY: Proactive summarization prevents context overflow before it happens.
YOU SHOULD SEE: The conversation is compressed while retaining recent context and a summary of earlier content.
Implement selective pruning: remove large tool results that are no longer relevant (e.g., search results that have already been processed).
WHY: Selective pruning is more efficient than full summarization because it targets specific bloat.
YOU SHOULD SEE: Large, irrelevant tool results are replaced with brief summaries.
Test your context management by running a long agentic loop (50+ iterations) and verifying the context stays within bounds while the agent remains effective.
WHY: End-to-end testing validates that context management works without breaking agent behavior.
YOU SHOULD SEE: The agent runs 50+ iterations without context overflow, maintaining task progress.
Sources
- Context Window— Anthropic Documentation
- Token Counting— Anthropic Documentation