Prompt Optimisation — Prompt Engineering & Structured Output

Evaluation Metrics

Before optimizing, define what 'good' means. Common metrics include: accuracy (does the output match expected results?), consistency (does the same input produce similar outputs?), format compliance (does the output follow the specified format?), token efficiency (how many tokens are used per request?), and latency (how long does the response take?).

Create a test set of 20-50 representative inputs with expected outputs. Run your prompt against this set and measure baseline performance before making changes.

Optimisation Techniques

Key techniques include: simplifying instructions (shorter prompts are often more effective than longer ones), improving examples (better examples have more impact than better instructions), using structured output (reduces formatting errors), adjusting temperature (lower for consistency, higher for creativity), and model selection (using smaller models for simpler tasks).

Always change one thing at a time and measure the impact. Multiple simultaneous changes make it impossible to attribute improvements.

Cost and Latency Optimisation

Token usage directly affects cost and latency. Reduce input tokens by: trimming verbose prompts, using references instead of full documents, caching repeated content (system prompts), and filtering irrelevant context. Reduce output tokens by: specifying concise formats, limiting response length, and using structured output.

For latency-sensitive applications, consider streaming (partial results as they generate), prompt caching (faster repeated prompts), and model selection (smaller models are faster).

Key Concept

Measure Before Optimising

Prompt optimisation without measurement is guesswork. Before changing anything, establish baseline metrics with a test set. After each change, re-measure to verify improvement. Changes that feel like improvements can actually degrade performance on cases you haven't tested. Systematic measurement is the difference between engineering and trial-and-error.

Exam Traps

EXAM TRAP

Optimising without a test set

Without a test set, you cannot objectively measure whether changes improve performance. The exam expects you to know that systematic evaluation is necessary.

EXAM TRAP

Making multiple changes simultaneously

If you change the prompt, the temperature, and the model at the same time, you cannot tell which change caused the improvement or degradation.

EXAM TRAP

Over-optimising for one metric at the expense of others

Reducing token count may hurt accuracy. Increasing accuracy may increase latency. The exam tests whether you can balance competing metrics.

Check Your Understanding

Your Claude-powered summarisation system has 90% accuracy but each request costs $0.05 and takes 8 seconds. The business wants to reduce cost to $0.02 while maintaining at least 85% accuracy. What is the most effective approach?

Build Exercise

Optimise a Prompt for Cost and Quality

Intermediate45 minutes

What you'll learn

Create evaluation test sets
Measure baseline prompt performance
Apply optimisation techniques systematically
Balance quality vs. cost tradeoffs

Choose a prompt for a specific task and create a test set of 10 inputs with expected outputs. Measure baseline accuracy and token usage.
WHY: Baseline measurement is the foundation of systematic optimisation.
YOU SHOULD SEE: Accuracy score and average token usage for the baseline prompt.
Try three prompt variants: a shorter version, a version with better examples, and a version with structured output. Measure each against the test set.
WHY: Comparing variants identifies which techniques have the most impact for your specific task.
YOU SHOULD SEE: Performance metrics for each variant, showing tradeoffs between accuracy and cost.
Select the best variant and try it with a smaller model (e.g., Haiku instead of Sonnet). Measure the quality difference.
WHY: Model selection is often the biggest lever for cost reduction. The question is whether the quality tradeoff is acceptable.
YOU SHOULD SEE: A cost comparison with quality metrics for each model.
Implement prompt caching for the optimised prompt and measure the cost impact over 10 sequential requests.
WHY: Caching reduces the per-request cost of system prompt tokens, which compounds over many requests.
YOU SHOULD SEE: Lower costs on requests 2-10 compared to request 1, due to cache hits.

Sources

Prompt Engineering Guide— Anthropic Documentation
Prompt Caching— Anthropic Documentation