Effective Context Engineering for AI Agents

Summary

Anthropic’s context engineering guide covers strategies for optimizing AI agent performance through deliberate token management. It goes beyond simple prompt engineering to optimize the entire information ecosystem — including system instructions, tools, external data, and message history.

The core insight is: “Find the smallest set of high-signal tokens that maximizes the probability of achieving the desired outcome.” This requires understanding the unique constraints of LLMs (context rot, finite attention budget) and applying the Goldilocks Principle — designing context that is neither too specific nor too vague.

Key Concepts

1. Context Engineering vs Prompt Engineering

Context Engineering: Optimizes the entire information ecosystem (system prompt, tools, external data, message history)
Prompt Engineering: Focuses on writing effective prompts
Difference: Context engineering is the strategic curation of every token provided to an LLM

2. Context Rot

Phenomenon: Model performance degrades as token count increases
Causes:
- “Attention budget” dilution due to n² pairwise token relationships
- Shorter sequences appear more frequently in training data
- Limited experience processing long contexts
Implication: Careful token selection is necessary regardless of context window size

3. Goldilocks Principle

Too Specific:

Hardcoded logic, excessive if-then rules
Brittle and difficult to maintain

Too Vague:

Assumes shared context
Inconsistent execution

Just Right:

Specific enough to guide behavior effectively
Flexible enough to provide strong heuristics

Practical Applications

Use Case 1: Tool Design for AI Agents

Best Practices:

Minimize tool overlap
Design tasks to be self-contained and error-robust
Use clear, descriptive parameter naming
Avoid excessive tool sets (prevents decision ambiguity)

Why it matters: More tools expand the agent’s decision space and waste context

Use Case 2: Dynamic Context Retrieval

Strategy: Instead of pre-loading all potentially relevant data, maintain lightweight references (file paths, queries, URLs) and load them JIT (Just-In-Time) as needed

Implementation (Claude Code approach):

Include CLAUDE.md as baseline context
Use Grep/Glob tools for runtime discovery
Avoids stale indexing issues

Analogy to human cognition: We don’t memorize everything — we retrieve when needed

Use Case 3: Long-Horizon Tasks

A. Compaction

Summarize conversation history: Preserve architecture decisions and open issues
Remove redundant output: Start with maximum recall, then optimize for precision
Trade-off: Cost of compaction vs cost of maintaining full context

B. Structured Note-Taking

Approach: Maintain external memory files (NOTES.md, to-do lists)
Case study: Claude playing Pokémon
- Tracked goals and strategies across thousands of steps
- Progressed without context resets
Benefit: Maintains state independent of context window

C. Sub-Agent Architectures

Structure:
- Coordinator agent (overall orchestration)
- Specialized sub-agents (focused tasks)
Process:
1. Sub-agent works with a clean context window
2. Returns a compressed summary (1,000–2,000 tokens)
3. Coordinator decides next steps
Benefits: Context isolation, specialization

Code Examples

Example 1: Information Architecture with XML

<background_information>
  You are an AI agent helping developers deploy applications.
  The user typically works with Docker and Kubernetes.
</background_information>

<instructions>
  1. Analyze the user's request
  2. Use search tools to find relevant files
  3. Provide code examples in the user's preferred language
</instructions>

<tool_guidance>
  - Use grep_search for keyword searches
  - Use glob_pattern for file discovery
  - Minimize redundant tool calls
</tool_guidance>

<output_description>
  Provide concise, actionable responses with:
  - Clear explanations
  - Code snippets
  - File paths in format: file_path:line_number
</output_description>

Structuring with XML tags helps the LLM clearly understand the role of each section and process information efficiently.

Example 2: Few-Shot Prompting (Curated Examples)

## Example Interactions

### Example 1: File Search
User: "Where is the authentication logic?"
Assistant: Authentication is handled in src/auth/handler.py:45

### Example 2: Code Explanation
User: "How does caching work?"
Assistant: The caching mechanism uses Redis with 5-minute TTL.
See src/cache/redis_client.py:23 for implementation.

Rather than trying to cover every edge case, select a diverse set of representative examples. For LLMs, examples are worth more than a thousand words of explanation.

Example 3: Dynamic Context Retrieval Pattern

# ❌ Bad: Pre-load all potentially relevant files
context = {
    "file1": read_file("src/app.py"),
    "file2": read_file("src/config.py"),
    "file3": read_file("src/utils.py"),
    # ... dozens of files
}

# ✅ Good: Maintain lightweight references, load JIT
references = {
    "app": "src/app.py",
    "config": "src/config.py",
    "utils": "src/utils.py"
}

# Agent uses tools to load only what's needed
# grep_search("authentication") → finds src/auth/handler.py
# read_file("src/auth/handler.py") → loads specific file

Loading files only when needed makes efficient use of the context window.

Before/After Comparison

Before (Inefficient Context Management)

System Prompt:
You are a helpful assistant. Help the user with whatever they need.

Tools:
- read_file, write_file, search_file, find_file, grep_file,
  list_files, count_lines, get_metadata, check_syntax,
  format_code, lint_code, run_tests, ...
  (15+ overlapping tools)

Problems:

Vague system prompt
Overlapping tool functionality
Agent decision confusion

After (Effective Context Engineering)

System Prompt:
You are a code analysis assistant. When users ask about code:
1. Use glob_pattern to find relevant files
2. Use grep_search for keyword searches
3. Use read_file to examine specific files
4. Provide file paths in format: file_path:line_number

Tools:
- read_file: Read file contents (parameters: file_path, offset, limit)
- glob_pattern: Find files by pattern (parameters: pattern, path)
- grep_search: Search for keywords (parameters: pattern, path, output_mode)

Improvements:

Specific, clear instructions
Minimal, non-overlapping tools
Explicit output format guidance

Limitations & Gotchas

⚠️ Context Window Size ≠ Optimal Performance

Even with a large context window, fewer tokens can be more effective
“Find the smallest set of high-signal tokens”

⚠️ Compaction Trade-offs

Compaction itself consumes tokens (summarization cost)
Compare compaction cost vs full context retention cost

⚠️ Few-Shot Example Curation

Do not enumerate edge cases (causes bloat)
Select diverse, representative examples

💡 Tip: Start with Maximum Recall, Optimize Precision

Include more information initially
Progressively remove what is unnecessary

💡 Tip: Human Cognition as Model

Just as humans don’t memorize everything
AI should use lightweight references + JIT loading

References

Next Steps

Apply Goldilocks Principle to current AI agent prompts
Audit tool designs for overlap and redundancy
Implement dynamic context retrieval pattern
Create sub-agent architecture for long-horizon tasks
Measure context efficiency (tokens used vs outcome quality)

Notes:

Why this guide matters for current projects:

Even while learning infrastructure in Phase 0a, understanding AI agent design principles is important
Apply these principles when building real AI agents in Week 0 (Python) and Phases 1–5
Especially useful as a reference when designing agents like prompt_reviewer, code_critic, and assignment_generator

Lessons from Anthropic’s engineering team:

Claude Code is a real-world example of these principles in action
CLAUDE.md as baseline + Grep/Glob runtime discovery = hybrid approach
External memory files used for long-horizon tasks (playing Pokémon)

Connection to Production-Ready Mindset:

Context optimization = cost optimization (tokens = money)
Observability: context usage monitoring is necessary (trackable via LGTM stack)
Include context efficiency when measuring agent performance