May 2, 2026
MCP, Honestly: What It Is and What It Is Not
What the Model Context Protocol actually is, what it gets right, where it leaks, and why the local-first version is the cleaner story.
Contents (9)
TL;DR. MCP is JSON-RPC 2.0 over stdio or streamable HTTP, with a handshake that negotiates capabilities and a sampling primitive that lets a server call back into the host's model. That last detail is what makes it more than function calling. It is the protocol that lets tools and models be coroutines, which is the actual operating-system abstraction underneath.
What it is, in 60 seconds
Latest spec: 2025-11-25 (the spec uses the date as the version). Wire format: JSON-RPC 2.0. Transports: stdio (subprocess, stdin/stdout) and streamable HTTP (which replaced the original SSE-only transport in early 2025).
Three roles: a host (Claude Desktop, Cursor, Zed, Claude Code), a client (one inside the host per server), and a server (the thing exposing capabilities).
Three server primitives:
tools(model-controlled functions, the headline feature)resources(readable context: files, DB rows, URIs, with optional subscription)prompts(user-controlled templates that show up as slash commands)
Three client primitives:
sampling(the server asks the host's LLM to do an inference, recursive)roots(the server asks the client what filesystem boundaries it can operate in)elicitation(the server asks the user a question mid-execution)
The official spec compares MCP to LSP. Before LSP, every editor implemented every language: N×M problem. After LSP, N+M. MCP offers the same deal for AI hosts and tool sources. Forget USB-C. LSP is the right reference.
The deeper analogy is device drivers. Each MCP server is a driver exposing a uniform interface for some piece of the world (filesystem, git, Postgres, browser). The host is a kernel arbitrating between drivers and processes. The LLM is userspace making syscalls. The roots primitive negotiates filesystem boundaries (mount points). The tasks capability (added 2025-11-25) is process management. The recursive sampling primitive is the syscall going back into the kernel from the driver. Genuinely OS-shaped.
The handshake
Most posts skip this. Here is what an MCP client actually exchanges with a server:
// 1. Client → server: initialize
{"jsonrpc":"2.0","id":1,"method":"initialize",
"params":{"protocolVersion":"2025-11-25",
"capabilities":{
"roots":{"listChanged":true},
"sampling":{},
"elicitation":{}},
"clientInfo":{"name":"my-host","version":"1.0.0"}}}
// 2. Server → client: capabilities mirror
{"jsonrpc":"2.0","id":1,"result":{
"protocolVersion":"2025-11-25",
"capabilities":{
"tools":{"listChanged":true},
"resources":{"subscribe":true,"listChanged":true},
"prompts":{"listChanged":true},
"logging":{}},
"serverInfo":{"name":"my-server","version":"1.0.0"}}}
// 3. Client → server: ack
{"jsonrpc":"2.0","method":"notifications/initialized"}
Capabilities are negotiated bilaterally. listChanged notifications mean the catalog is live: the server can push updates and the host invalidates its cached tool list. A function-calling API can't do this.
A real config most users hand-edit:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/code"]
},
"git": {
"command": "uvx",
"args": ["mcp-server-git", "--repository", "/Users/me/code/site"]
},
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
},
"github": {
"type": "streamable_http",
"url": "https://api.githubcopilot.com/mcp/",
"headers": { "Authorization": "Bearer ${GITHUB_PAT}" }
}
}
}
Four stdio servers and one streamable-HTTP server in the same file. Mix-and-match is the whole pitch.
Ecosystem state, May 2026
- Reference servers (
github.com/modelcontextprotocol/servers): seven left in the canonical list.everything,fetch,filesystem,git,memory,sequential-thinking,time. Notably moved out (now inservers-archived): GitHub, GitLab, Slack, Postgres, SQLite, Sentry, Redis, Puppeteer, Google Drive, Google Maps, Brave Search. Superseded by first-party servers from those vendors. Sign of maturity. - First-party heavyweights: GitHub MCP (~30k stars, 80+ tools), Playwright MCP (~32k stars, accessibility-tree-driven, no screenshots).
- Bundling:
.mcpb(formerly.dxt) is a zip with a manifest. Like.vsixfor VS Code, single-click install for end users. - SDKs: official in TypeScript, Python (FastMCP is canonical), Go, Rust, Java, Kotlin, C#, Swift, Ruby. Ten lines of FastMCP gets you a working tool.
- Clients: Claude Desktop, Claude Code, Cursor, Cline, Continue.dev, Zed, VS Code Copilot. All first-class. Ollama is the conspicuous exception.
The 4-legged identity problem
OAuth was designed for a triangle. User, app, IdP. Three legs.
Add an agent and an MCP server to that triangle and you get a square. User → agent → MCP → API. The API has no idea who triggered the request. The MCP server is calling on behalf of an agent calling on behalf of a user, but the auth header at the API only sees the MCP server's credentials. Identity gets lost mid-chain.
The right answer is token exchange (RFC 8693): the agent exchanges its user token for an audience-scoped token before each downstream call. Limited IdP support. Keycloak does it. Microsoft doesn't. Okta only via claims.
The MCP spec codified four RFCs that make this work end-to-end:
- RFC 9728 for resource metadata (which auth server to talk to)
- RFC 8414 for AS metadata (endpoints, capabilities, sign-in keys)
- RFC 7591 for dynamic client registration (the agent registers itself)
- RFC 8693 for token exchange (audience-scoped tokens at every hop)
In production you stand up a broker layer that handles the dance, caches tokens, integrates with secret-management, writes an audit log of every hop. A week of infrastructure work. Real engineering.
The local-first collapse
On a single-machine local setup, all of that reduces to filesystem permissions and environment variables. The MCP spec itself carves an exception for stdio:
Implementations using a STDIO transport SHOULD NOT follow this specification, and instead retrieve credentials from the environment.
Translation: if it is all on your machine, don't bother with OAuth. Use the env. Use file permissions. The protocol gets out of your way. The harness is sovereign even when the model isn't.
Now the awkward part. Ollama does not natively speak MCP. Ollama implements its own tool-calling API, OpenAI-shaped, returning tool_calls on /api/chat. To use MCP servers with Ollama you need a bridge. The community standard is jonigl/mcp-client-for-ollama:
- Bridge reads a standard
mcpServersconfig (same shape as Claude Desktop's). - Calls
tools/liston each MCP server, flattens into Ollama'stoolsarray. - Query goes to Ollama with tools. If
tool_callsreturns, bridge calls the appropriatetools/call. Results piped back. Ollama synthesizes. - Supports stdio, SSE, streamable HTTP.
Works. Run it daily. But there is a real gap: the host application is doing work that should arguably be in Ollama itself. Native MCP in Ollama or llama.cpp is the highest-value PR nobody has merged yet.
When local MCP works, the threat model collapses. A local model in a network-restricted container with only stdio MCP servers can write all the code it wants and exfiltrate exactly nothing. The lethal-trifecta attack (private data, untrusted content, exfiltration) only works if all three legs exist. Cut the network leg. Done.
Honest critiques
- Security is opt-in. Most spec guidance uses SHOULD, not MUST. Treat the SHOULDs as MUSTs. Hosts that implement bare-minimum spec are buyable.
- Tool poisoning. A malicious server can ship tool descriptions with hidden instructions the model reads as commands. Mitigation is "trust your servers," which is fine in principle.
- Confused-deputy and token-passthrough were the most-exploited footguns in 2025. The spec now forbids passthrough. Easy to violate accidentally.
- One-click install is arbitrary code execution. A malicious startup command in a manifest is unrestricted shell access. Most clients show the command. Most users click through.
- The MCP plateau. More servers, same output quality. The agent doesn't know which tool to call when. Curated context beats more servers (Brandon Waselnuk, Unblocked, AI Dev SF).
- Token cost of MCP itself. GitHub's official server exposes 80+ tools. Every turn costs context tokens. Use the
--toolsetsflag to scope. Or use theghCLI directly: the agent already knows it from training data and it costs zero context. - No standard tracing. Multi-hop topologies are hellish to debug. Will get fixed. Hasn't yet.
What to build with it today
In order:
- Local stdio servers, no network. Filesystem, git, memory. Configure them in Claude Code or Cursor. Watch the agent navigate your repo.
- A custom server in FastMCP. Pick one tedious internal task (export a CSV, run a deploy, look up billing status). Write a 30-line MCP server. The act of naming the tool is the unlock.
- The Ollama bridge. Install
jonigl/mcp-client-for-ollama. Run the same workflow against local Qwen 3.6 27B or Gemma 4 26B MoE. Lose some intelligence. Cut daily token spend to zero. For 80% of tasks (routing, classifying, tool-calling), the local model is fine. - Sandboxing.
docker/cagentfor containerized execution. Network egress disabled by default. Allowlist what you actually need. - A second client. Run the same servers from Cursor and Claude Code. Anything that breaks across clients is something you depended on that you shouldn't have.
What I'm watching
- Native MCP in Ollama or llama.cpp. Whoever lands this PR first removes the bridge-client tax for local users.
- Agent-to-agent protocols. MCP is for an agent talking to its tools. Google's A2A is closer for agent-to-agent, but no registry yet, no multi-peer.
- Skills as the new package manager. Anthropic's October 2025 skills launch was, per one panel I sat through, a bigger deal than MCP. A folder with a SKILL.md and some scripts; loaded on demand for a few tokens.
The takeaway
MCP is not magic. It is a wire format with a handshake and good defaults. What makes it interesting is that it is the first wire format that treats the LLM as a first-class actor in a distributed system, not a black box behind a single API call. The implementation is uneven. The shape is right.
Run it locally before you bother with the cloud version. Write at least one server for something boring you do every week. The protocol gets out of the way once you stop fighting it.
Local-First AI
If this was useful, the weekly notes go deeper. No drip sequences, no upsells.
n8n templates, cost teardowns, and what is actually working in 2026. No drip sequences, no upsells. Reply to opt out.