What Is Context Engineering? The Skill That Replaced Prompt Engineering
Table of Contents
- The Shift Nobody Expected
- The Definition
- Why Prompt Engineering Wasn’t Enough
- Why It Matters Now
- The Core Principle
- The Evolution: A Timeline
- Real-World Impact
- What’s Next
The Shift Nobody Expected
In early 2025, “prompt engineering” was the hottest skill in AI. By mid-2025, the industry’s most influential voices were already declaring it insufficient.
On June 18, 2025, Shopify CEO Tobi Lutke posted: “I really like the term ‘context engineering’ over prompt engineering.” One week later, Andrej Karpathy — former Director of AI at Tesla and co-founder of OpenAI — endorsed the shift with a post that crystallized the idea:
“+1 for ‘context engineering’… the delicate art and science of filling the context window with just the right information for the next step.”
By June 27, Simon Willison — one of the most respected voices in applied AI — wrote a blog post declaring that “context engineering” would stick as a term, conceding that prompt engineering had always had an image problem.
Three months later, Anthropic’s engineering team published what became the definitive industry reference: “Effective Context Engineering for AI Agents.” The post accumulated over 500,000 views and established the formal framework the industry now uses.
What happened? Why did the entire AI industry pivot from prompt engineering to context engineering in a matter of months?
The Definition
Anthropic’s engineering team defines context engineering as:
“The set of strategies for curating and maintaining the optimal set of tokens during LLM inference.”
But to understand what that means, you need to understand what “context” actually is. When you interact with an AI model, everything it sees in a single request is the context. This includes:
- The system prompt — Instructions that define behavior, personality, and constraints
- Conversation history — All previous messages in the session
- Tool definitions — Schemas for every tool the model can use
- Tool outputs — Results from tool calls (file reads, API responses, search results)
- Retrieved documents — Content fetched via RAG or other retrieval systems
- The user’s current message — The latest input
- The model’s own reasoning — Chain-of-thought or extended thinking tokens
Context engineering is the discipline of managing all of this — deciding what goes in, what stays out, when information should be added, when it should be removed, and how it should be structured.
Karpathy’s Seven Components
Andrej Karpathy identified seven specific components that context engineering encompasses:
- Task instructions and goals
- Relevant context and background information
- Available tools and API references
- Conversation history
- Relevant examples (few-shot)
- Output format constraints
- Chain-of-thought structure
He likened the LLM to a CPU and the context window to RAM — the model can only reason about what is loaded into its working memory at the moment of inference.
Philipp Schmid’s Framework
Philipp Schmid, a prominent AI engineer, expanded this into a seven-component framework:
- System Prompt — The agent’s persistent instructions
- User Prompt — The current request
- Conversation State — The evolving history
- Retrieved Knowledge — Information pulled from external sources
- Tool Definitions — What the agent can do
- Structured Output — Format constraints for responses
- Long-term Memory — Persistent knowledge across sessions
Both frameworks emphasize the same core insight: the context is a system, not just a prompt. Engineering that system is what produces reliable AI behavior.
Why Prompt Engineering Wasn’t Enough
Prompt engineering focuses on crafting better instructions — how you ask the model to do something. It treats the model as a stateless function: input goes in, output comes out.
This worked reasonably well for simple, single-turn interactions. “Summarize this article.” “Write a function that sorts an array.” “Translate this paragraph to Spanish.”
But as AI systems became more complex — multi-turn conversations, tool-using agents, RAG pipelines, multi-step workflows — the prompt became a small fraction of what the model actually processes. In a production agentic system:
- The system prompt might be 2,000 tokens
- Tool definitions might consume 17,000-55,000 tokens
- Conversation history might accumulate 50,000+ tokens
- Tool outputs might inject 100,000+ tokens
- Retrieved documents might add 10,000-50,000 tokens
The user’s actual prompt — the thing prompt engineering focuses on — might represent less than 1% of what the model sees. Optimizing that 1% while ignoring the other 99% is like tuning the radio while the engine is on fire.
The Three Key Differences
| Dimension | Prompt Engineering | Context Engineering |
|---|---|---|
| Scope | The instruction text | Everything the model sees |
| Timeframe | Single turn | Across the entire session lifecycle |
| Focus | How to ask | What information to provide, when, and how much |
Prompt engineering is a subset of context engineering. Writing good prompts still matters — but it is one component of a much larger system.
What Practitioners Say
The Hacker News discussion on context engineering revealed a practical consensus:
- Experienced developers noted that effective context limits are often ~10K tokens, far below advertised maximums
- Multi-agent architectures were recommended specifically to overcome single-context limitations
- Skeptics called it “buzzword inflation,” but even skeptics agreed that managing what goes into context is fundamentally different from just writing better prompts
The OpenAI community forum produced an even more provocative take: that context engineering itself will eventually be superseded by “automated workflow architecture” where systems automatically decide what context to assemble. But for now, context engineering is the skill that separates effective AI developers from those who hit walls they don’t understand.
Why It Matters Now
Three converging trends made context engineering essential in 2025-2026:
1. The Rise of Agentic AI
AI agents — systems that take multi-step actions using tools — consume context at an extraordinary rate. A single coding agent session can burn through 100,000+ tokens across 20 tool calls, with each step resending all previous context. Without deliberate context management, agents degrade, stall, or silently lose critical information mid-task.
Gartner reported a 1,445% surge in multi-agent AI inquiries in 2025-2026, signaling massive enterprise adoption of agentic systems. Every one of these systems faces the context management challenge.
2. Context Windows Got Bigger — But Not Better
Context windows grew from 4K tokens (GPT-3.5) to 1M+ tokens (Gemini 2.5 Pro, GPT-4.1) in under three years. The marketing message was simple: more context means better results.
The research tells a different story. Chroma’s study of 18 state-of-the-art models found that performance degrades at every length increment, not just near the limit. The NoLiMa benchmark found that at 32K tokens, 11 of 12 tested models dropped below 50% of their short-context performance. Stanford’s “Lost in the Middle” paper showed 30%+ accuracy drops when relevant information sits in the middle of long contexts.
Bigger windows created an illusion of unlimited memory. Context engineering is the antidote — the recognition that more tokens is not the same as better tokens.
3. Cost and Latency Scale With Context
Every token you send costs money. Every token adds latency. In agentic workflows where the entire context is resent with every API call, costs compound quadratically with the number of steps. A 20-step agent workflow doesn’t cost 20x a single step — it costs the sum of an arithmetic series as each step includes all previous context.
Production teams reported that unoptimized agentic systems can cost $255,000+ annually from context mismanagement alone. Context engineering directly addresses the economics of AI deployment.
The Core Principle
Everything in context engineering flows from a single principle, articulated by Anthropic:
“Find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.”
This is elegant in its simplicity and radical in its implications:
- Every token that doesn’t serve the goal is not neutral — it is actively harmful. Irrelevant information dilutes the model’s attention and degrades performance.
- Context is a curated resource, not a dumping ground. The best systems are not the ones that load the most information — they are the ones that load the right information.
- Removal is as important as addition. Knowing when to take information out of context is as valuable as knowing when to put it in.
This principle governs everything that follows in this series: from understanding tokens and context windows (Part 2), to analyzing where tokens actually go in agentic systems (Part 3), to the specific techniques for efficient context usage (Part 4), to the practical playbook for building context-efficient applications (Part 5).
The Evolution: A Timeline
| Period | Paradigm | Focus |
|---|---|---|
| 2022-2023 | Prompt Engineering | Writing better instructions for single-turn interactions |
| 2023-2024 | Advanced Prompting | Chain-of-thought, few-shot, system prompts, RAG pipelines |
| 2025 | Context Engineering | Managing the full context lifecycle: what goes in, when, and what gets removed |
| 2026+ | Automated Context | Systems that dynamically assemble optimal context without manual engineering |
We are in the transition from manual context engineering to partially automated approaches. Tools like Claude Code already implement automatic compaction, progressive tool disclosure, and context-aware token budgeting. But understanding the underlying principles remains essential — you cannot effectively evaluate or debug automated systems without understanding what they are trying to optimize.
Real-World Impact
The shift to context engineering is not theoretical. Case studies demonstrate measurable results:
- Five Sigma Insurance: Reduced AI errors by 80% through systematic context engineering — structuring what information the model received at each decision point
- Block/Square: Integrated MCP (Model Context Protocol) to standardize how context flows between tools and agents
- Microsoft: Achieved 26% productivity gains through context-aware AI workflows
- Academic SaaS platform: Two developers produced 220,000+ lines of code in 15 weeks using systematic context engineering practices
- Healthcare virtual assistants: Context engineering enabled compliant, accurate medical information delivery by precisely controlling what knowledge was available at each interaction point
These results come not from better prompts, but from better systems for managing what the model sees.
What’s Next
This series covers context engineering from foundations to practice:
- Part 2: Tokens and Context Windows Explained — What tokens actually are, how they work, context window sizes and pricing across providers
- Part 3: Where Do All the Tokens Go? — The anatomy of context consumption in agentic AI, and the five failure modes that degrade performance
- Part 4: Techniques for Efficient Context Usage — Prompt caching, compression, RAG, chunking, pruning, sub-agents, and architectural patterns
- Part 5: The Practitioner’s Playbook — Decision frameworks, monitoring, common mistakes, and a quick-start checklist