Task 5.3

Long Conversations

Long conversations — those with many turns or large amounts of exchanged content — present unique challenges for Claude-based applications. As conversations grow, they consume more context, cost more per turn, and may exceed the context window.

Context Growth in Conversations

Each turn in a conversation adds tokens to the context. After 20+ turns, the accumulated context can become significant — especially if turns include tool calls, code blocks, or detailed responses. The cost per turn increases as the conversation grows because all previous turns are re-sent as input.

This means the cost of turn N includes the cost of all previous turns. A 50-turn conversation where each turn averages 200 tokens has 10,000 tokens of history sent with the 50th turn. The total cost across all turns is quadratic in the number of turns.

Conversation Summarization

When a conversation approaches the context limit, summarize older turns into a compact summary. The summary replaces the detailed history, preserving key information while reducing token count.

Effective summarization preserves: decisions made, facts established, user preferences expressed, and current task status. It discards: verbose explanations, failed attempts, and routine confirmations.

Claude Code uses this approach with its /compact command, which summarizes the conversation history to free up context space.

Conversation Continuity

For conversations that span sessions (user returns later), you need a persistence strategy. Options include: storing the full conversation history (expensive in tokens when resumed), storing a summary (efficient but lossy), or storing key facts and decisions as structured data (most efficient for retrieval).

The best approach depends on what information the continued conversation needs. A customer support bot might only need the ticket status and last action. A coding assistant might need a summary of changes made and current task state.

Key Concept

Conversation Cost Is Quadratic, Not Linear

The total cost of a conversation grows quadratically with the number of turns, not linearly. Each turn sends all previous turns as context. Turn 1 sends ~T tokens, turn 2 sends ~2T, turn 3 sends ~3T. The total cost is proportional to N^2/2. This means a 100-turn conversation costs not 100x a single turn, but roughly 5000x. Proactive context management transforms this quadratic growth into approximately linear growth.

Exam Traps

EXAM TRAP

Assuming conversation cost is linear

Because each turn re-sends all previous turns, cost grows quadratically. The exam may test whether you understand this and can calculate costs accordingly.

EXAM TRAP

Summarizing too early or too late

Summarizing too early loses useful detail. Summarizing too late risks context overflow. The optimal trigger is typically 60-70% of the context limit.

EXAM TRAP

Not preserving critical conversation state during summarization

If summarization drops critical facts (user preferences, decisions made), the model will behave inconsistently after summarization.

Check Your Understanding

A coding assistant conversation has reached 150K tokens (200K limit). The user is in the middle of a multi-file refactoring task. What should happen?

Build Exercise

Build Long Conversation Management

Intermediate45 minutes

What you'll learn

  • Implement conversation summarization
  • Measure conversation cost growth
  • Build conversation persistence
  • Test continuity after summarization
  1. Create a conversation simulator that generates 50 turns and tracks the token count and cost at each turn. Plot the growth curve.

    WHY: Visualizing the quadratic cost growth motivates context management implementation.

    YOU SHOULD SEE: A curve showing accelerating cost growth as turns increase.

  2. Implement a summarization trigger: when context exceeds 70% of the limit, summarize all but the last 5 turns into a compact summary.

    WHY: Automatic summarization prevents context overflow in long conversations.

    YOU SHOULD SEE: The context size drops significantly after summarization, then grows again until the next trigger.

  3. Verify conversation quality after summarization: ask the model about information from summarized turns and check that it responds correctly.

    WHY: Summarization must preserve important information for the conversation to remain coherent.

    YOU SHOULD SEE: The model correctly references information from summarized turns.

  4. Implement conversation persistence: save the conversation state (summary + recent turns) to disk and restore it in a new session.

    WHY: Persistence enables conversations that span sessions without losing context.

    YOU SHOULD SEE: A restored conversation that continues seamlessly from where it left off.

Sources

Previous

Prompt Caching