Context Window Management with MCP

Mar 10, 2026

New

MCP Context Window Claude Code Token Optimization

The Hidden Cost of MCP Servers
How MCP Tools Consume Tokens
Tool Search: Claude Code’s Answer to Context Bloat
Deferred Tools: The Technical Mechanism
How Many MCP Servers Is Too Many?
Practical Strategies for Context Efficiency
MCP Server Authors: Optimizing for Context
Quick Reference: Context Management Settings
What is Next

The Hidden Cost of MCP Servers

You added twelve MCP servers to Claude Code. GitHub, Sentry, Playwright, a database connector, Slack, Notion, a few more. Each one sounded useful. Then you noticed Claude Code struggling --- responses felt less sharp, conversations ran out of room faster, and the model occasionally forgot earlier parts of your discussion.

The cause: every MCP tool definition gets loaded into the context window. And with 200,000 tokens available on Claude Sonnet 4, those tool definitions can quietly consume 30-40% of your usable space before you type a single prompt.

Understanding this cost --- and the strategies Claude Code provides to manage it --- is the difference between an effective MCP setup and one that fights itself.

How MCP Tools Consume Tokens

Every MCP server exposes one or more tools, and each tool has a definition that includes its name, description, and parameter schema. These definitions are loaded into the context window so the model knows what tools are available.

Real-World Token Measurements

Research by developer Scott Spence measured actual token consumption across MCP servers:

Configuration	Tools	Tokens Consumed	% of 200K Window
Single server (mcp-omnisearch, original)	20 tools	14,214 tokens	7.1%
Single server (mcp-omnisearch, optimized)	8 tools	5,663 tokens	2.8%
All MCP servers enabled simultaneously	Many	82,000 tokens	41%
GitHub MCP server alone	91 tools	~46,000 tokens	23%

The average token cost is roughly 710 tokens per tool definition. That means:

10 tools = ~7,000 tokens consumed
50 tools = ~35,000 tokens consumed
100 tools = ~70,000 tokens consumed (35% of the window, gone)

When your MCP setup hits 82,000 tokens, you have used 41% of the context window on tool definitions alone, leaving only about 118,000 tokens for your actual conversation, code, file contents, and model responses.

Tool Search: Claude Code’s Answer to Context Bloat

Claude Code includes a feature called Tool Search that fundamentally changes how MCP tools are loaded. Instead of loading all tool definitions upfront, Tool Search creates a lightweight search index and fetches tool details on-demand.

How It Works

When Tool Search is active, MCP tools are deferred --- their full definitions are not loaded into context
Claude uses a search tool to discover relevant MCP tools when it needs them
Only the 3-5 tools Claude actually needs for the current task are loaded
From your perspective, MCP tools work exactly the same way

The Numbers

The context savings are dramatic:

Metric	Without Tool Search	With Tool Search	Reduction
Initial context for MCP tools	~46,000 tokens (GitHub alone)	~500 tokens (search index)	99%
Typical multi-server setup	~72,000 tokens	~8,700 tokens	88%
Percentage of 200K window used	36%	4.4%	85% fewer tokens

When It Activates

Tool Search runs in auto mode by default. It activates when your MCP tool definitions would consume more than 10% of the context window (approximately 20,000 tokens). If you have only a few tools, they load normally without Tool Search.

Configuration Options

Control Tool Search behavior with the ENABLE_TOOL_SEARCH environment variable:

Value	Behavior
`auto`	Activates when MCP tools exceed 10% of context (default)
`auto:5`	Activates at a custom 5% threshold
`true`	Always enabled, regardless of tool count
`false`	Disabled; all MCP tools loaded upfront

# Use a lower threshold (activate sooner)
ENABLE_TOOL_SEARCH=auto:5 claude

# Always use tool search
ENABLE_TOOL_SEARCH=true claude

# Disable entirely (load all tools upfront)
ENABLE_TOOL_SEARCH=false claude

You can also set this in your settings.json env field for persistence.

Model Requirements

Tool Search requires models that support tool_reference blocks:

Supported: Sonnet 4 and later, Opus 4 and later
Not supported: Haiku models

Deferred Tools: The Technical Mechanism

Under the hood, Tool Search uses a concept called deferred loading. Tools marked with "defer_loading": true stay out of context until discovered through search.

For developers building or customizing MCP servers, the implementation looks like this:

{
  "name": "github_create_issue",
  "description": "Creates a new issue in a GitHub repository",
  "defer_loading": true,
  "input_schema": { ... }
}

Key rules for deferred tools:

The search tool itself must never have defer_loading: true
Keep 3-5 frequently-used tools as non-deferred (always loaded)
At least one tool must always be loaded in context

For batch configuration of an entire MCP server:

{
  "type": "mcp_toolset",
  "mcp_server_name": "github-server",
  "default_config": { "defer_loading": true },
  "configs": {
    "search_issues": { "defer_loading": false }
  }
}

This sets all GitHub tools to deferred by default, except search_issues which stays always-loaded.

How Many MCP Servers Is Too Many?

There is no single answer, but research and community experience converge on practical guidelines:

The Rule of Thumb

20-30 MCP servers: Reasonable to configure globally (across your user scope)
Under 10: Keep under 10 enabled per project
Under 80 tools active: Stay under 80 tools actively loaded at any time

Why More Is Not Always Better

Research indicates LLMs start struggling with 10-20+ active tools. The failure modes include:

Confusion between similar tools: When two tools do similar things (e.g., web_search and tavily_search), the model wastes tokens deciding which to use
Poor tool selection: With too many options, the model picks suboptimal tools more often
Hallucinated tools: Occasionally, the model invents tool names that do not exist

With Tool Search enabled, you can configure more servers globally because they are not all loaded at once. But even with Tool Search, having 50+ tools available means more search overhead per task.

Practical Strategies for Context Efficiency

Strategy 1: Per-Project Server Selection

Only enable the servers relevant to your current project. Use scopes to manage this:

# Global utilities (user scope --- always available)
claude mcp add --scope user --transport http notion https://mcp.notion.com/mcp
claude mcp add --scope user --transport http github https://api.githubcopilot.com/mcp/

# Project-specific (project scope --- shared with team)
claude mcp add --scope project --transport stdio db -- npx -y @bytebase/dbhub \
  --dsn "postgresql://readonly:pass@prod.db.com:5432/app"

Strategy 2: Monitor Token Usage

Use the /context command inside Claude Code to see a breakdown of token usage. This shows you exactly how much space MCP tool definitions are consuming.

Strategy 3: Consolidate Overlapping Servers

If you have multiple search tools (Brave, Tavily, Perplexity), pick one. If you have both a general filesystem MCP and Desktop Commander, decide which you need. Overlapping tools waste context and confuse the model.

Strategy 4: Use Tool Search Proactively

If you know you need many MCP servers, force Tool Search on:

ENABLE_TOOL_SEARCH=true claude

This ensures deferred loading even if your tool definitions are below the 10% threshold.

Strategy 5: Manage MCP Output Size

Large MCP tool outputs also consume context. Claude Code warns at 10,000 tokens and caps at 25,000 tokens by default.

# Increase for data-heavy workflows
export MAX_MCP_OUTPUT_TOKENS=50000
claude

For servers that regularly produce large outputs (database queries, log analysis), consider configuring the server to paginate or filter responses.

MCP Server Authors: Optimizing for Context

If you are building MCP servers, your design decisions directly impact how many tokens each tool consumes:

Write Concise Descriptions

A verbose 87-token description like “Search the web using Tavily search engine. This tool provides comprehensive web search capabilities including factual lookups, academic research with citations, and real-time information retrieval…” can be reduced to a 12-token version: “Search using Tavily. Best for factual/academic topics with citations.”

Instead of exposing four separate search tools, expose one tool with a provider parameter:

{
  "name": "web_search",
  "description": "Search the web using the specified provider",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": { "type": "string" },
      "provider": { "type": "string", "enum": ["tavily", "brave", "perplexity"] }
    }
  }
}

This reduces four tool definitions (~2,800 tokens) to one (~710 tokens).

Add Server Instructions for Tool Search

When Tool Search is enabled, the server’s instructions field helps Claude know when to search for your tools:

"instructions": "This server provides database querying and schema inspection tools. Search for these tools when the user asks about database operations, SQL queries, or schema analysis."

Quick Reference: Context Management Settings

Setting	Environment Variable	Default	Purpose
Tool Search mode	`ENABLE_TOOL_SEARCH`	`auto`	Controls when deferred loading activates
Tool Search threshold	`ENABLE_TOOL_SEARCH=auto:N`	`auto:10`	Percentage of context that triggers Tool Search
MCP output warning	---	10,000 tokens	Warns when tool output is large
MCP output max	`MAX_MCP_OUTPUT_TOKENS`	25,000 tokens	Caps tool output size
Startup timeout	`MCP_TIMEOUT`	Varies	Time allowed for server startup

What is Next

Your context window is under control. Now the question becomes: which MCP servers are actually worth enabling? The answer depends heavily on what kind of work you do.

Part 3 provides curated recommendations organized by developer role.

Context Window Management with MCP

Table of Contents

The Hidden Cost of MCP Servers

How MCP Tools Consume Tokens

Real-World Token Measurements

Tool Search: Claude Code’s Answer to Context Bloat

How It Works

The Numbers

When It Activates

Configuration Options

Model Requirements

Deferred Tools: The Technical Mechanism

How Many MCP Servers Is Too Many?

The Rule of Thumb

Why More Is Not Always Better

Practical Strategies for Context Efficiency

Strategy 1: Per-Project Server Selection

Strategy 2: Monitor Token Usage

Strategy 3: Consolidate Overlapping Servers

Strategy 4: Use Tool Search Proactively

Strategy 5: Manage MCP Output Size

MCP Server Authors: Optimizing for Context

Write Concise Descriptions

Add Server Instructions for Tool Search

Quick Reference: Context Management Settings

What is Next

Sources & References

Context Window Management with MCP

Table of Contents

The Hidden Cost of MCP Servers

How MCP Tools Consume Tokens

Real-World Token Measurements

Tool Search: Claude Code’s Answer to Context Bloat

How It Works

The Numbers

When It Activates

Configuration Options

Model Requirements

Deferred Tools: The Technical Mechanism

How Many MCP Servers Is Too Many?

The Rule of Thumb

Why More Is Not Always Better

Practical Strategies for Context Efficiency

Strategy 1: Per-Project Server Selection

Strategy 2: Monitor Token Usage

Strategy 3: Consolidate Overlapping Servers

Strategy 4: Use Tool Search Proactively

Strategy 5: Manage MCP Output Size

MCP Server Authors: Optimizing for Context

Write Concise Descriptions

Consolidate Related Tools

Add Server Instructions for Tool Search

Quick Reference: Context Management Settings

What is Next

Sources & References