Domain 515% weight

Quick Reference: Domain 5Context Management & Reliability

Context Window Budget

The context window is a shared budget: system prompt + tool definitions + conversation history + response must all fit. Plan token budgets explicitly.

ComponentTypical SizeOptimization
System prompt500-2000 tokensKeep focused; move dynamic content to user messages
Tool definitions100-500 per toolMinimize description verbosity; limit tool count
Conversation historyGrows over timeSummarize, truncate, or use sliding window
Expected responseReserve 1000-4000Set max_tokens to cap this allocation

Prompt Caching Strategy

Place cache_control breakpoints after stable content that repeats across requests. Up to 4 breakpoints allowed. Cache TTL is 5 minutes, refreshed on each hit.

Breakpoint PlacementEffectiveness
After system promptHigh — system prompt rarely changes
After tool definitionsHigh — tools are static per session
After few-shot examplesMedium — stable but may vary by task
After conversation historyLow — changes every turn

Cost: 1.25x write cost for initial cache creation. 0.1x read cost on cache hits. Breakeven at ~1.4 subsequent reads.

Long Conversation Strategies

StrategyWhen to UseKey Characteristic
SummarizationGeneral long conversationsReplace old messages with a summary
Sliding windowRecent context matters mostKeep last N messages, drop oldest
Selective retentionKey facts scattered throughoutKeep important messages, summarize rest
Anti-PatternWhy It Fails
Never truncatingEventually hits context limit, request fails
Aggressive truncationLoses critical context, Claude contradicts earlier statements
Summarizing too lateFirst failure is user-visible — summarize proactively

Rate Limiting and Retries

Status CodeMeaningAction
429Rate limitedRetry with exponential backoff + jitter
500Server errorRetry (transient)
529API overloadedRetry with longer backoff
400Bad requestDo not retry — fix the request
401UnauthorizedDo not retry — fix authentication

Exponential backoff formula: delay = min(base * 2^attempt + random_jitter, max_delay). Always add jitter to prevent thundering herd.

Production Reliability Patterns

PatternWhen to UseKey Characteristic
Retry with backoffTransient failures (429, 500)Automatic recovery from temporary issues
Circuit breakerSustained failuresFail fast after threshold, avoid hammering a down service
Fallback modelPrimary model unavailableSwitch to smaller/cheaper model for degraded service
Graceful degradationNon-critical features failReturn cached or simplified response
Health checksContinuous monitoringDetect issues before users do

Key metrics to monitor: time-to-first-token, total latency, input/output token counts, error rate by status code, cost per request, and tool call frequency.