Effective Context Engineering for AI Agents
Anthropic's context engineering guide covers strategies for optimizing AI agent performance through deliberate token management — moving beyond simple prompt engineering to optimize the entire information ecosystem, including system instructions, tools, external data, and message history.
Source: www.anthropic.com
Table of Contents
Summary
Anthropic’s context engineering guide covers strategies for optimizing AI agent performance through deliberate token management. It goes beyond simple prompt engineering to optimize the entire information ecosystem — including system instructions, tools, external data, and message history.
The core insight is: “Find the smallest set of high-signal tokens that maximizes the probability of achieving the desired outcome.” This requires understanding the unique constraints of LLMs (context rot, finite attention budget) and applying the Goldilocks Principle — designing context that is neither too specific nor too vague.
Key Concepts
1. Context Engineering vs Prompt Engineering
- Context Engineering: Optimizes the entire information ecosystem (system prompt, tools, external data, message history)
- Prompt Engineering: Focuses on writing effective prompts
- Difference: Context engineering is the strategic curation of every token provided to an LLM
2. Context Rot
- Phenomenon: Model performance degrades as token count increases
- Causes:
- “Attention budget” dilution due to n² pairwise token relationships
- Shorter sequences appear more frequently in training data
- Limited experience processing long contexts
- Implication: Careful token selection is necessary regardless of context window size
3. Goldilocks Principle
Too Specific:
- Hardcoded logic, excessive if-then rules
- Brittle and difficult to maintain
Too Vague:
- Assumes shared context
- Inconsistent execution
Just Right:
- Specific enough to guide behavior effectively
- Flexible enough to provide strong heuristics
Practical Applications
Use Case 1: Tool Design for AI Agents
Best Practices:
- Minimize tool overlap
- Design tasks to be self-contained and error-robust
- Use clear, descriptive parameter naming
- Avoid excessive tool sets (prevents decision ambiguity)
Why it matters: More tools expand the agent’s decision space and waste context
Use Case 2: Dynamic Context Retrieval
Strategy: Instead of pre-loading all potentially relevant data, maintain lightweight references (file paths, queries, URLs) and load them JIT (Just-In-Time) as needed
Implementation (Claude Code approach):
- Include CLAUDE.md as baseline context
- Use Grep/Glob tools for runtime discovery
- Avoids stale indexing issues
Analogy to human cognition: We don’t memorize everything — we retrieve when needed
Use Case 3: Long-Horizon Tasks
A. Compaction
- Summarize conversation history: Preserve architecture decisions and open issues
- Remove redundant output: Start with maximum recall, then optimize for precision
- Trade-off: Cost of compaction vs cost of maintaining full context
B. Structured Note-Taking
- Approach: Maintain external memory files (NOTES.md, to-do lists)
- Case study: Claude playing Pokémon
- Tracked goals and strategies across thousands of steps
- Progressed without context resets
- Benefit: Maintains state independent of context window
C. Sub-Agent Architectures
- Structure:
- Coordinator agent (overall orchestration)
- Specialized sub-agents (focused tasks)
- Process:
- Sub-agent works with a clean context window
- Returns a compressed summary (1,000–2,000 tokens)
- Coordinator decides next steps
- Benefits: Context isolation, specialization
Code Examples
Example 1: Information Architecture with XML
<background_information>
You are an AI agent helping developers deploy applications.
The user typically works with Docker and Kubernetes.
</background_information>
<instructions>
1. Analyze the user's request
2. Use search tools to find relevant files
3. Provide code examples in the user's preferred language
</instructions>
<tool_guidance>
- Use grep_search for keyword searches
- Use glob_pattern for file discovery
- Minimize redundant tool calls
</tool_guidance>
<output_description>
Provide concise, actionable responses with:
- Clear explanations
- Code snippets
- File paths in format: file_path:line_number
</output_description>
Structuring with XML tags helps the LLM clearly understand the role of each section and process information efficiently.
Example 2: Few-Shot Prompting (Curated Examples)
## Example Interactions
### Example 1: File Search
User: "Where is the authentication logic?"
Assistant: Authentication is handled in src/auth/handler.py:45
### Example 2: Code Explanation
User: "How does caching work?"
Assistant: The caching mechanism uses Redis with 5-minute TTL.
See src/cache/redis_client.py:23 for implementation.
Rather than trying to cover every edge case, select a diverse set of representative examples. For LLMs, examples are worth more than a thousand words of explanation.
Example 3: Dynamic Context Retrieval Pattern
# ❌ Bad: Pre-load all potentially relevant files
context = {
"file1": read_file("src/app.py"),
"file2": read_file("src/config.py"),
"file3": read_file("src/utils.py"),
# ... dozens of files
}
# ✅ Good: Maintain lightweight references, load JIT
references = {
"app": "src/app.py",
"config": "src/config.py",
"utils": "src/utils.py"
}
# Agent uses tools to load only what's needed
# grep_search("authentication") → finds src/auth/handler.py
# read_file("src/auth/handler.py") → loads specific file
Loading files only when needed makes efficient use of the context window.
Before/After Comparison
Before (Inefficient Context Management)
System Prompt:
You are a helpful assistant. Help the user with whatever they need.
Tools:
- read_file, write_file, search_file, find_file, grep_file,
list_files, count_lines, get_metadata, check_syntax,
format_code, lint_code, run_tests, ...
(15+ overlapping tools)
Problems:
- Vague system prompt
- Overlapping tool functionality
- Agent decision confusion
After (Effective Context Engineering)
System Prompt:
You are a code analysis assistant. When users ask about code:
1. Use glob_pattern to find relevant files
2. Use grep_search for keyword searches
3. Use read_file to examine specific files
4. Provide file paths in format: file_path:line_number
Tools:
- read_file: Read file contents (parameters: file_path, offset, limit)
- glob_pattern: Find files by pattern (parameters: pattern, path)
- grep_search: Search for keywords (parameters: pattern, path, output_mode)
Improvements:
- Specific, clear instructions
- Minimal, non-overlapping tools
- Explicit output format guidance
Limitations & Gotchas
⚠️ Context Window Size ≠ Optimal Performance
- Even with a large context window, fewer tokens can be more effective
- “Find the smallest set of high-signal tokens”
⚠️ Compaction Trade-offs
- Compaction itself consumes tokens (summarization cost)
- Compare compaction cost vs full context retention cost
⚠️ Few-Shot Example Curation
- Do not enumerate edge cases (causes bloat)
- Select diverse, representative examples
💡 Tip: Start with Maximum Recall, Optimize Precision
- Include more information initially
- Progressively remove what is unnecessary
💡 Tip: Human Cognition as Model
- Just as humans don’t memorize everything
- AI should use lightweight references + JIT loading
References
Next Steps
- Apply Goldilocks Principle to current AI agent prompts
- Audit tool designs for overlap and redundancy
- Implement dynamic context retrieval pattern
- Create sub-agent architecture for long-horizon tasks
- Measure context efficiency (tokens used vs outcome quality)
Notes:
Why this guide matters for current projects:
- Even while learning infrastructure in Phase 0a, understanding AI agent design principles is important
- Apply these principles when building real AI agents in Week 0 (Python) and Phases 1–5
- Especially useful as a reference when designing agents like prompt_reviewer, code_critic, and assignment_generator
Lessons from Anthropic’s engineering team:
- Claude Code is a real-world example of these principles in action
- CLAUDE.md as baseline + Grep/Glob runtime discovery = hybrid approach
- External memory files used for long-horizon tasks (playing Pokémon)
Connection to Production-Ready Mindset:
- Context optimization = cost optimization (tokens = money)
- Observability: context usage monitoring is necessary (trackable via LGTM stack)
- Include context efficiency when measuring agent performance