MCP Security: The Enterprise Guide to Securing Model Context Protocol

Written by Tom Horyn | Jun 25, 2026 3:19:57 PM

MCP security is the practice of governing Model Context Protocol servers at the deployment boundary: understanding how MCP expands an AI agent's attack surface, protecting against manipulation by untrusted servers, controlling which tools an agent is allowed to know about, enforcing the permitted action class of every tool call at runtime, and managing credentials so they do not accumulate in places agents can read them. As MCP becomes the standard protocol for connecting AI agents to enterprise tools and data sources, the threat model it introduces is distinct from traditional API security and requires a different set of controls.

What Is MCP and How Does It Work?

Model Context Protocol is an open standard, published by Anthropic in late 2024, that defines how AI agents communicate with external tools and data sources. Before MCP, every agent-to-tool integration required custom API development: a specific client written for a specific service. MCP standardized the connection layer, which is why adoption accelerated quickly across agent frameworks and developer tooling.

The basic flow works as follows. An MCP server exposes a set of capabilities as named tools. An MCP client, which is the agent, connects to the server and requests the tool list. The server responds with the full list of tools it offers, including their names, descriptions, and parameter schemas. The agent passes that tool list to the model. The model uses it to decide what actions to take, generates tool-call requests, and the agent executes them against the MCP server.

MCP servers come in two transport types that have different security implications.

Network transport means the MCP server runs as a separate process accessible over HTTP or WebSocket. The agent connects to it over the network. Traffic between the agent and the MCP server is visible to network monitoring tools.

Stdio transport means the MCP server runs as a subprocess launched directly by the agent, communicating over standard input and standard output. This is the transport type used by Claude Desktop, Cursor, Windsurf, and most developer-facing MCP tooling, all of which launch MCP servers as local subprocesses per their published architecture documentation. Traffic exchanged over stdio never appears on the network. A security control limited to network traffic observation cannot see it.

This transport distinction matters for every security control discussed in this piece. Controls that only intercept network-transport MCP traffic have a visibility gap covering a significant portion of developer-environment MCP activity.

How MCP Expands an AI Agent's Attack Surface

MCP expands an agent's attack surface in three ways that traditional API security models do not anticipate.

The tool list becomes part of the model's context. Traditional API calls are programmer-directed: a developer writes code that calls a specific endpoint. MCP tool calls are model-directed: the model decides which tools to call based on what it sees in the tool list and what instructions it has received. This means the tool list itself is an input to the model's reasoning, not just a capability catalog. A tool list that contains destructive capabilities is advertising those capabilities to the model.

The attack path through MCP is multi-step and partially observable. A typical MCP-enabled attack path looks like this: an agent receives a prompt injection embedded in a retrieved document or web page; the injected instruction directs the agent to use a specific tool; the agent sees that tool in its tool list; the agent calls the tool; the tool accesses an external system or executes a sensitive action. The injection and the sensitive action may be separated by several intermediate steps, each of which looks legitimate in isolation. Traditional security controls that evaluate individual API calls in isolation do not reconstruct this chain.

MCP servers are not all trusted systems. Enterprise deployments of MCP increasingly pull in servers from third parties: open-source repositories, community-maintained GitHub projects, commercial integrations from vendors the security team has not vetted. Each of those servers is an input to the model's reasoning context.

What Happens When the MCP Server Itself Cannot Be Trusted?

The dominant MCP security conversation in the industry right now is not about what happens when a trusted MCP server exposes too many tools. It is about what happens when the MCP server itself cannot be trusted.

Tool-description manipulation. An MCP server's tool list includes not just tool names but natural-language descriptions of what each tool does and how to use it. Those descriptions go into the model's context. A malicious MCP server can craft tool descriptions that embed instructions designed to alter the model's behavior: descriptions that instruct the model to include certain information in its responses, exfiltrate data through specific parameters, or take actions outside the user's intent. This is a form of prompt injection that does not require a compromised document or web page. It is built into the server's advertised capability descriptions.

The research community has demonstrated this pattern under the name "tool poisoning." An MCP server that appears to offer a legitimate capability can include hidden instructions in its tool descriptions that redirect model behavior. The model reads the description as part of understanding how to use the tool. The instruction embedded in the description executes as part of that reasoning.

Third-party and open-source MCP server risk. Developer adoption of MCP has outpaced security review. Developers routinely install MCP servers from GitHub repositories, community lists, and third-party vendors without formal security assessment. Each installed MCP server has access to the agent's context, can advertise tools with manipulated descriptions, and can receive tool calls that may include sensitive data from the agent's working context. The supply-chain risk profile is similar to npm packages or VS Code extensions: widely installed, minimally reviewed, with significant access to the development environment.

A useful framing for enterprise security teams: every MCP server a developer installs is granting that server the ability to influence what the model does during that agent's session. That is a materially different access level than installing a command-line tool.

Permission sprawl across MCP servers. As agent workflows grow more complex, agents connect to multiple MCP servers simultaneously. Each server grants a set of permissions. The effective permission set of the agent is the union of all permissions across all connected servers, which may be far broader than any single server's scope would suggest. An agent that connects to a file-system server, a database server, a calendar server, and an email server has read/write access to a substantial portion of an enterprise's data estate, even if each individual server's permissions seemed reasonable in isolation.

Rug-pull and update risk. A legitimate MCP server that developers have installed and trusted can change its behavior in a subsequent update. Tool descriptions can be modified, new tools can be added, and behaviors can be altered through an update that installs without explicit security review. The trust established at installation time does not persist across updates without re-evaluation.

The Four Core MCP Security Problems

With the threat model established, the specific technical security problems MCP introduces fall into four categories.

1. Tool-list exposure to the model

When an MCP server advertises its full tool list, the model receives all of it, including tools the agent has no legitimate need for. An MCP server hosting a customer relationship management integration might expose tools including ticket-read, knowledge-base-search, refund-issue, account-delete, and bulk-export. A customer support agent needs ticket-read and knowledge-base-search. Under normal operation it uses the appropriate tools. Under a prompt injection, a jailbreak, or a manipulated tool description from another connected server, the model may propose using the destructive tools.

The model can only propose calling tools it knows about. Controlling what the model learns about is therefore a security primitive, not a secondary concern.

2. Action class ambiguity in tool execution

MCP tools do not always make their action class obvious from their names. A tool called "execute-sql" accepts both read queries and write queries. A tool called "file-operations" may support read, write, and delete. When a task is declared as read-only analysis, that declaration has no runtime enforcement unless something is inspecting the actual operation in every tool call.

An agent conducting data analysis that constructs a cleanup query as a natural next step, or an agent summarizing documents that generates a save operation, may produce out-of-scope operations through ordinary reasoning rather than through compromise. The declared intent of the task and the actual operations executed against the MCP server can diverge silently.

3. Credential exposure in stdio deployments

In stdio MCP deployments, the configuration that tells the agent which MCP servers to launch and how to authenticate to them typically lives in a configuration file on the endpoint: a JSON file in the user's home directory, an environment variable, or a local config directory. Those files often contain credentials in plain text: API keys, database connection strings, service account tokens.

Any process running on the endpoint under the same user context can read those files. An agent operating in that environment, whether operating as designed or operating under a prompt injection, has access to those credentials as a side effect of running on the same machine. The credentials do not need to be the target of an attack. They simply need to be present.

4. Stdio transport visibility gap

As described above, a significant portion of MCP activity in developer environments uses stdio transport and generates no network traffic. Security controls that only observe network traffic cannot see this activity: not which tools are advertised, not which tool calls are made, not what data passes between the agent and the server. This visibility gap affects discovery, monitoring, and enforcement equally.

Security Controls for MCP Deployments

The threat model above maps to a specific set of controls. Each control addresses one or more of the four problems.

Govern which MCP servers agents are permitted to connect to. Before the question of what a trusted MCP server does, there is the question of which servers are permitted. Enterprise deployments should maintain an allowlist of approved MCP servers, with each server having undergone security review before inclusion. This applies to both internal MCP servers and third-party servers. An MCP server from an unreviewed GitHub repository should not be reachable from a production agent without explicit approval.

At the network layer, enclave boundaries control which MCP servers an agent can reach by making non-permitted servers unreachable rather than blocked. An agent inside a project enclave can connect only to the MCP servers configured for that enclave. MCP servers outside the enclave are not network-reachable, regardless of what the agent is instructed to do.

Filter the tool list before it reaches the model. At the session layer, a proxy between the agent and the MCP server intercepts the tool list advertisement before the agent passes it to the model. The proxy rewrites the list to include only the tools the agent is authorized to use for its specific task in its specific project context. The model receives the filtered list. It cannot propose a tool it was never shown.

This control addresses both the over-exposure problem for trusted MCP servers and part of the tool-description manipulation problem for untrusted servers: if a manipulated tool description is in a tool that has been filtered out, the model never receives it.

Inspect the actual operation in every tool call. A proxy inspecting tool calls at the session layer classifies the actual operation in each call, not only the declared intent of the task. SQL statements are parsed and classified as reads or writes. File operations are classified by method. HTTP API calls are classified by verb and path. If the operation falls outside the declared action class for the task, the call is blocked before it reaches the MCP server.

This control does not require understanding the semantic meaning of the operation. It requires classifying the operation type, which is deterministic for SQL, HTTP methods, and file system operations.

Apply data-class policies to tool call payloads and responses. The content that flows between an agent and an MCP server, both in tool call parameters and in tool responses, should be subject to data-class inspection. Sensitive data that should not leave a project context should be caught before it is transmitted to an MCP server. Sensitive data that an MCP server returns should be evaluated before it enters the agent's context, particularly for MCP servers that retrieve content from external or uncontrolled sources.

Manage MCP server credentials at the session layer, not the endpoint. Credentials for MCP servers should terminate at the session proxy rather than residing in endpoint configuration files. The agent's local configuration references a substitute identifier. The proxy holds the real credential and injects it inline on outbound calls to the MCP server. The endpoint configuration file contains no live credentials.

Extend coverage to stdio-transport MCP servers. An endpoint sensor can route stdio-transport subprocess traffic through the session proxy at the OS level, without requiring changes to the MCP server or the agent framework. The MCP server still launches as a subprocess. The communication still uses stdio. But it passes through the policy enforcement layer, which means tool-list filtering, action class inspection, and data-class policies apply to stdio traffic exactly as they apply to network traffic.

How to Evaluate MCP Security Controls: Five Questions

These questions separate controls that hold under adversarial conditions from controls designed only for normal operation.

1. Does tool-list filtering happen before or after the model receives the tool list?

Filtering that happens after the model has already received the full tool list does not prevent the model from knowing what tools exist. It can block calls, but the model has already been exposed to the full tool-description context, including any manipulated descriptions. Effective filtering rewrites the advertisement before it reaches the model.

2. Does the control cover untrusted MCP server scenarios, not just misconfigured trusted ones?

Ask whether the vendor's threat model includes tool-description manipulation, third-party MCP server risk, and supply-chain risk, or whether it treats all MCP servers as trusted systems that simply need better access controls. The two threat models require different controls.

3. Does the control cover stdio-transport MCP servers?

Ask vendors directly whether their control covers stdio-transport MCP servers. A control limited to network-transport MCP traffic has a visibility gap covering a significant portion of developer-tooling deployments. If stdio coverage requires changes to the MCP server or the agent framework, the coverage is partial and will not extend to servers installed by developers without IT involvement.

4. Does action class enforcement inspect the actual operation or the declared intent?

Ask vendors to demonstrate enforcement on a concrete case: an agent declared read-only that submits an UPDATE statement through an execute-sql tool. What happens? If the answer is that the policy says read-only but the tool call executes before enforcement fires, the control does not address the action class problem.

5. Where do MCP server credentials live in the deployment, and where are session logs stored?

If credentials reside in endpoint configuration files, the exposure risk described above applies. If session logs reside in vendor-hosted infrastructure, data residency questions apply to every piece of sensitive information the agent processed during its sessions. For regulated environments, both credentials and logs should reside on infrastructure the organization controls.

MCP Security and the Broader AI Agent Governance Architecture

MCP governance is one layer in a broader AI agent security architecture. The controls described in this piece address what happens at the MCP server boundary. They depend on and complement controls at the network layer, which govern which MCP servers an agent can reach in the first place, and at the lifecycle governance layer, which covers agent discovery, authorization, monitoring, and decommissioning.

The relationship between these layers is additive. A network-layer enclave boundary controls reachability: which MCP servers an agent can connect to. Tool-list filtering controls model-context exposure: which tools the model learns about from the servers it can connect to. Action class enforcement controls execution scope: which operations the agent can perform through the tools it knows about. Each layer addresses a distinct residual that the prior layer leaves open.

For a full treatment of the enclave architecture and its relationship to MCP governance, see Zero Trust Architecture for Agentic AI in 2026.

Frequently Asked Questions

What is MCP security?

MCP security is the practice of governing Model Context Protocol servers at the deployment boundary. It covers four primary concerns: controlling which MCP servers agents are permitted to connect to, filtering the tool list the model receives, enforcing the permitted action class of every tool call at runtime, and managing credentials so they do not reside in endpoint configuration files where agents can access them.

Can MCP servers manipulate AI agent behavior?

Yes. An MCP server can craft tool descriptions that embed instructions designed to alter the model's behavior. Because tool descriptions go into the model's context as part of the tool list, manipulated descriptions are a form of prompt injection that does not require a compromised document or external content. This pattern, sometimes called tool poisoning, has been demonstrated in published security research. It means that an MCP server from an untrusted source is not just a misconfigured access control problem: it is a potential instruction-injection vector.

What is the difference between MCP network transport and stdio transport?

Network transport means the MCP server runs as a separate process accessible over HTTP or WebSocket, generating network traffic that monitoring tools can observe. Stdio transport means the MCP server runs as a subprocess of the agent, communicating through stdin and stdout, generating no network traffic. Claude Desktop, Cursor, Windsurf, and most developer-facing MCP tools use stdio transport for local server deployments. Security controls limited to network traffic observation cannot see stdio-transport MCP activity.

What is the MCP tool-list problem?

The MCP tool-list problem is that AI models can only propose actions using tools they know about. When an MCP server advertises its full tool list to an agent, the model receives all of it, including destructive or sensitive tools the agent has no legitimate need for. Under a prompt injection or a jailbreak, the model may propose using those tools. Tool-list filtering rewrites the advertisement before it reaches the model, so the model never learns about tools outside its permitted scope.

Can prompt-layer guardrails replace MCP tool-list filtering?

No. A prompt-layer guardrail instructs the model not to use certain tools. It does not prevent the model from knowing those tools exist, and a sufficiently crafted prompt can override the instruction. Tool-list filtering prevents the model from learning about tools outside its permitted scope. The two controls are complementary, not substitutes. A guardrail addresses what the model should do. Tool-list filtering addresses what the model knows about.

What are the biggest MCP security risks for enterprise deployments?

The four primary risks are: tool-description manipulation by untrusted or compromised MCP servers, which can redirect model behavior through injected instructions in tool descriptions; permission sprawl from multiple connected MCP servers, where the union of permissions across all servers exceeds what any single server's scope would imply; stdio-transport visibility gaps, where network monitoring tools cannot observe activity between agents and locally-running MCP servers; and credential exposure, where MCP server authentication credentials stored in endpoint configuration files are accessible to any process running under the same user context.

Sources

Anthropic. Model Context Protocol Specification. 2024.https://modelcontextprotocol.io
Anthropic. Claude Desktop MCP Configuration Documentation. 2025. https://docs.anthropic.com/en/docs/claude-desktop/mcp
Cursor. MCP Server Configuration Documentation. 2025. https://docs.cursor.com/context/model-context-protocol
Gravitee. 2026 State of AI Agent Security Report. 2026.
Cloud Security Alliance and Aembit. Non-Human Identity and AI Agent Security Survey. 2026.
National Institute of Standards and Technology. SP 800-207: Zero Trust Architecture. 2020. https://doi.org/10.6028/NIST.SP.800-207
National Institute of Standards and Technology. AI 100-1: Artificial Intelligence Risk Management Framework. 2023. https://doi.org/10.6028/NIST.AI.100-1

View full post