Context Growth in Conversations
Each turn in a conversation adds tokens to the context. After 20+ turns, the accumulated context can become significant — especially if turns include tool calls, code blocks, or detailed responses. The cost per turn increases as the conversation grows because all previous turns are re-sent as input.
This means the cost of turn N includes the cost of all previous turns. A 50-turn conversation where each turn averages 200 tokens has 10,000 tokens of history sent with the 50th turn. The total cost across all turns is quadratic in the number of turns.
Conversation Summarization
When a conversation approaches the context limit, summarize older turns into a compact summary. The summary replaces the detailed history, preserving key information while reducing token count.
Effective summarization preserves: decisions made, facts established, user preferences expressed, and current task status. It discards: verbose explanations, failed attempts, and routine confirmations.
Claude Code uses this approach with its /compact command, which summarizes the conversation history to free up context space.
Conversation Continuity
For conversations that span sessions (user returns later), you need a persistence strategy. Options include: storing the full conversation history (expensive in tokens when resumed), storing a summary (efficient but lossy), or storing key facts and decisions as structured data (most efficient for retrieval).
The best approach depends on what information the continued conversation needs. A customer support bot might only need the ticket status and last action. A coding assistant might need a summary of changes made and current task state.
Key Concept
Conversation Cost Is Quadratic, Not Linear
The total cost of a conversation grows quadratically with the number of turns, not linearly. Each turn sends all previous turns as context. Turn 1 sends ~T tokens, turn 2 sends ~2T, turn 3 sends ~3T. The total cost is proportional to N^2/2. This means a 100-turn conversation costs not 100x a single turn, but roughly 5000x. Proactive context management transforms this quadratic growth into approximately linear growth.
Exam Traps
Assuming conversation cost is linear
Because each turn re-sends all previous turns, cost grows quadratically. The exam may test whether you understand this and can calculate costs accordingly.
Summarizing too early or too late
Summarizing too early loses useful detail. Summarizing too late risks context overflow. The optimal trigger is typically 60-70% of the context limit.
Not preserving critical conversation state during summarization
If summarization drops critical facts (user preferences, decisions made), the model will behave inconsistently after summarization.
Check Your Understanding
A coding assistant conversation has reached 150K tokens (200K limit). The user is in the middle of a multi-file refactoring task. What should happen?
Build Exercise
Build Long Conversation Management
What you'll learn
- Implement conversation summarization
- Measure conversation cost growth
- Build conversation persistence
- Test continuity after summarization
Create a conversation simulator that generates 50 turns and tracks the token count and cost at each turn. Plot the growth curve.
WHY: Visualizing the quadratic cost growth motivates context management implementation.
YOU SHOULD SEE: A curve showing accelerating cost growth as turns increase.
Implement a summarization trigger: when context exceeds 70% of the limit, summarize all but the last 5 turns into a compact summary.
WHY: Automatic summarization prevents context overflow in long conversations.
YOU SHOULD SEE: The context size drops significantly after summarization, then grows again until the next trigger.
Verify conversation quality after summarization: ask the model about information from summarized turns and check that it responds correctly.
WHY: Summarization must preserve important information for the conversation to remain coherent.
YOU SHOULD SEE: The model correctly references information from summarized turns.
Implement conversation persistence: save the conversation state (summary + recent turns) to disk and restore it in a new session.
WHY: Persistence enables conversations that span sessions without losing context.
YOU SHOULD SEE: A restored conversation that continues seamlessly from where it left off.
Sources
- Context Windows— Anthropic Documentation
- Claude Code Compact— Anthropic Documentation