API Error Handling
Claude API errors fall into several categories. Rate limit errors (429) require backoff and retry. Overload errors (529) indicate temporary capacity issues. Authentication errors (401) are configuration problems. Context length errors mean your conversation exceeds the model's context window.
For rate limits and overload, implement exponential backoff with jitter. For context length errors, implement conversation summarization or truncation. For authentication errors, fail fast — retrying won't help.
Tool Failure Recovery
When a tool call fails, the agent needs to know about the failure so it can adapt. Return a clear error message as the tool result rather than throwing an exception that breaks the loop. The model can then decide whether to retry, try an alternative approach, or inform the user.
Common tool failure patterns: external API timeouts (implement tool-level timeouts), invalid parameters (validate before execution), permission errors (check before calling), and resource not found errors. Each should return a structured error that the model can act on.
Graceful Degradation
When components fail, the system should degrade gracefully rather than crash entirely. If a search tool is unavailable, the agent can still answer from its knowledge. If a database is down, the agent can queue operations for later. If the primary model is rate-limited, fall back to a secondary model.
Graceful degradation requires designing the system with fallback paths from the start. Each critical dependency should have an alternative or a way to continue with reduced functionality.
Key Concept
Feed Errors Back to the Model
When a tool call fails, return the error as a tool_result rather than crashing the loop. Claude can interpret error messages and adapt — retrying with different parameters, trying alternative tools, or explaining the issue to the user. Hiding errors from the model removes its ability to self-correct.
Exam Traps
Retrying all errors with the same strategy
Different errors need different handling. Rate limits need backoff; auth errors need configuration fixes; context length errors need conversation management. The exam tests whether you can match error types to recovery strategies.
Crashing the loop on tool errors
Tool errors should be returned to the model as error results, not thrown as exceptions. The model can often recover by trying a different approach.
Infinite retry loops
Always set a maximum retry count. Exponential backoff without a maximum can delay responses indefinitely.
Check Your Understanding
An agent is using a web search tool that returns a 429 (rate limit) error. What is the correct recovery strategy?
Build Exercise
Build Resilient API Calls
What you'll learn
- Implement exponential backoff with jitter
- Handle different error types appropriately
- Return tool errors to the model
- Add circuit breaker pattern for persistent failures
Create a retry wrapper function that implements exponential backoff with jitter for API calls.
WHY: Exponential backoff is the foundation of resilient API communication.
YOU SHOULD SEE: The wrapper retries with increasing delays: ~1s, ~2s, ~4s.
Add error classification: retry on 429 and 529, fail fast on 401 and 400, and summarize context on context length errors.
WHY: Different errors need different strategies. Retrying an auth error wastes time.
YOU SHOULD SEE: Rate limits are retried; auth errors fail immediately with clear messages.
Create a tool wrapper that catches errors and returns them as structured tool results instead of throwing.
WHY: The model needs to see errors to adapt its strategy.
YOU SHOULD SEE: When a tool fails, the model receives an error message and can decide what to do next.
Implement a simple circuit breaker: after 5 consecutive failures for a tool, disable it for 60 seconds before allowing retries.
WHY: Circuit breakers prevent repeated calls to a broken service, reducing load and improving recovery time.
YOU SHOULD SEE: After 5 failures, the tool returns 'temporarily unavailable' without making the actual call.
Sources
- Error Handling— Anthropic Documentation
- Rate Limits— Anthropic Documentation