Task 4.4

Few-Shot & Examples

Few-shot learning uses examples in the prompt to demonstrate the desired behavior. Instead of describing what you want in abstract terms, you show Claude concrete examples of input-output pairs.

Example Selection

Choose examples that cover the range of expected inputs and edge cases. Include at least one 'typical' example, one 'edge case' example, and one example showing how to handle invalid or unexpected input.

The examples should be representative of real data. Synthetic examples that are too clean or too simple may not prepare the model for real-world messiness. Use actual data samples when possible.

Example Structure

Structure examples clearly with labeled input and output sections. Use consistent formatting across all examples. For complex outputs, show the complete expected output including all fields and formatting.

Examples can be included in the system prompt or in the user messages. Placing them in the system prompt with prompt caching is more token-efficient when the same examples are used across many conversations.

Number of Examples

More examples generally improve consistency, but each example consumes context tokens. For most tasks, 3-5 examples provide a good balance. Use fewer examples (1-2) when context space is limited or when the task is simple. Use more examples (5-10) when the task is complex or when you need very consistent output formatting.

The quality of examples matters more than quantity. Three well-chosen examples that cover different cases outperform ten similar examples.

Key Concept

Show, Don't Tell

Examples communicate expected behavior more effectively than abstract instructions. If you want Claude to format output a certain way, show it. If you want it to handle edge cases a certain way, show it. When an instruction could be interpreted multiple ways, an example removes ambiguity. The combination of clear instructions AND examples is the most effective approach.

Exam Traps

EXAM TRAP

Using only similar examples

If all examples are similar, the model may overfit to that pattern and fail on different inputs. Include diverse examples covering different cases.

EXAM TRAP

Too many examples wasting context

Each example consumes input tokens. In agentic loops where the system prompt is sent with every iteration, excessive examples compound token costs.

EXAM TRAP

Not including edge case examples

If you don't show how to handle edge cases, the model will improvise. Edge case examples are often the most valuable.

Check Your Understanding

You are building a data extraction pipeline. The model needs to extract dates from text in various formats (MM/DD/YYYY, 'January 5th, 2024', 'next Tuesday'). How should you structure your few-shot examples?

Build Exercise

Optimize Few-Shot Examples

Beginner30 minutes

What you'll learn

  • Select effective few-shot examples
  • Structure examples for clarity
  • Measure the impact of examples on output quality
  • Balance example count vs. context usage
  1. Choose a classification task (e.g., sentiment analysis, intent detection). Create 3 diverse examples covering positive, negative, and ambiguous cases.

    WHY: Diverse examples teach the model to handle the full range of inputs.

    YOU SHOULD SEE: Three examples that clearly demonstrate the expected classification for different types of input.

  2. Test with 10 sample inputs: compare output quality with 0 examples, 1 example, 3 examples, and 5 examples.

    WHY: Empirical testing reveals the optimal number of examples for your specific task.

    YOU SHOULD SEE: Quality improves from 0 to 3 examples, with diminishing returns beyond that.

  3. Add an edge case example (e.g., mixed sentiment, sarcasm) and retest. Measure the impact on edge case handling.

    WHY: Edge case examples have outsized impact on handling unusual inputs.

    YOU SHOULD SEE: Better handling of edge cases after adding the specific example.

  4. Move examples to the system prompt and test with prompt caching. Compare token usage vs. examples in user messages.

    WHY: System prompt examples with caching reduce per-request token costs.

    YOU SHOULD SEE: Lower per-request costs when examples are in the cached system prompt.

Sources

Previous

Prompt Chaining