MCP Architecture Explained: Clients, Servers, Transports, and Tools

Link copied
MCP Architecture Explained: Clients, Servers, Transports, and Tools

MCP Architecture Explained: Clients, Servers, Transports, and Tools

Most introductions to Model Context Protocol (MCP) jump straight to "here is how to build a server." That works for hello-world demos, but it leaves you without a real mental model of what is happening under the hood — which makes debugging painful and design decisions guesswork.

This article slows down and walks through the complete MCP architecture: the layers, the actors, the message lifecycle, and the wire-level details. By the end you will be able to read an MCP trace and know exactly what is happening at every step.

The four-layer mental model #

MCP is best understood as four stacked layers:

Layer What it does Example
Host The user-facing application Claude Desktop, Cursor, an internal AI agent
Client The MCP runtime embedded in the host The MCP library inside Claude Desktop
Transport The wire — moves JSON-RPC messages stdio, Streamable HTTP
Server The capability provider A weather MCP server, a Postgres MCP server

The host is the only layer the human sees. Everything below it is plumbing.

Host vs Client — a common point of confusion #

These two terms get used interchangeably, but the spec treats them as distinct:

  • A host is the application (Claude Desktop is one host).
  • A client is one connection the host opens to one server.

A single host typically opens many clients — one per configured MCP server. So Claude Desktop with three MCP servers configured has three active clients running inside it, each talking to a different server.

┌──────────────────────── Host (Claude Desktop) ────────────────────────┐
│                                                                       │
│   ┌─ Client A ─┐    ┌─ Client B ─┐    ┌─ Client C ─┐                  │
│   │ stdio ⇄    │    │ HTTP ⇄     │    │ stdio ⇄    │                  │
│   └────────────┘    └────────────┘    └────────────┘                  │
│         │                  │                  │                       │
└─────────┼──────────────────┼──────────────────┼───────────────────────┘
          │                  │                  │
     ┌────▼─────┐      ┌─────▼─────┐      ┌─────▼─────┐
     │ Postgres │      │  GitHub   │      │  Slack    │
     │  Server  │      │  Server   │      │  Server   │
     └──────────┘      └───────────┘      └───────────┘

The three transports #

MCP messages are JSON-RPC 2.0 — a tiny, widely-supported protocol. What changes between transports is how those messages are physically moved.

1. stdio #

The server runs as a subprocess of the host. Messages flow over the subprocess's stdin/stdout streams, one JSON object per line.

Use it when:

  • The server runs on the same machine as the host
  • You want zero network configuration
  • The user already has Node, Python, or whatever runtime installed

2. HTTP + SSE (legacy) #

The server is a standalone HTTP service. Client → server requests use plain HTTP POST. Server → client notifications use Server-Sent Events (SSE) on a separate persistent connection.

Use it when:

  • The server is remote (different machine, different network)
  • Multiple hosts need to connect to the same server

3. Streamable HTTP (the modern remote transport) #

A refinement introduced in a 2025 spec update. Combines request/response and streaming into a single endpoint using chunked HTTP or SSE responses. Simpler to deploy than the dual-channel HTTP+SSE pattern, and the recommended transport for new remote servers.

Anatomy of an MCP session #

Every MCP session goes through the same five phases. Walking through them with real messages is the fastest way to internalize how the protocol works.

Phase 1: Initialize #

The client says "hello, here is what I support." The server responds with what it supports.

Client → Server:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-06-18",
    "capabilities": { "sampling": {}, "roots": { "listChanged": true } },
    "clientInfo": { "name": "my-host", "version": "1.0.0" }
  }
}

Server → Client:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-06-18",
    "capabilities": {
      "tools": { "listChanged": true },
      "resources": {},
      "prompts": {}
    },
    "serverInfo": { "name": "weather-server", "version": "1.0.0" }
  }
}

Note the capability negotiation. Both sides declare what they can do; the rest of the session is constrained by the intersection.

Phase 2: Initialized notification #

The client follows up with a fire-and-forget notification that the handshake is done. No id field — this is a one-way notification, not a request.

{ "jsonrpc": "2.0", "method": "notifications/initialized" }

Phase 3: Discovery #

Now the client asks the server what it offers:

{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }

Response:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [{
      "name": "get_weather",
      "description": "Get current weather for any city",
      "inputSchema": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required": ["city"]
      }
    }]
  }
}

The host hands these tools to the LLM as available actions. Similar resources/list and prompts/list calls fetch the other two primitives.

Phase 4: Invocation #

The LLM decides to call get_weather. The host forwards the call:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": { "city": "Chennai" }
  }
}

The server executes and returns:

{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [{ "type": "text", "text": "It is 31°C in Chennai, partly cloudy." }],
    "isError": false
  }
}

The host feeds the text back into the LLM's context. The conversation continues.

Phase 5: Shutdown #

The client closes the transport. For stdio that means closing stdin/stdout. For HTTP that means closing the SSE connection or letting it time out. There is no explicit "goodbye" message in the protocol.

Requests, responses, and notifications #

JSON-RPC 2.0 has three message kinds; MCP uses all three:

Kind Has id? Expects reply? MCP example
Request Yes Yes tools/list, tools/call
Response Same id as request The result or error
Notification No No notifications/initialized, notifications/progress

Notifications are how servers push events to clients without a request-response roundtrip — used for progress updates, resource changes, and logging.

A minimal echo server in code #

Here is the smallest possible MCP server that exposes one tool. It captures everything the architecture demands in under 30 lines:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({
  name: 'echo-server',
  version: '1.0.0',
});

server.tool(
  'echo',
  'Echoes back whatever message you send',
  { message: z.string().describe('Anything you want echoed back') },
  async ({ message }) => ({
    content: [{ type: 'text', text: `You said: ${message}` }],
  })
);

await server.connect(new StdioServerTransport());

The SDK handles the initialize handshake, capability negotiation, tools/list response, and tools/call dispatch automatically. You provide the business logic; the SDK provides the wire protocol.

Tools, resources, prompts — the three capabilities #

A server can expose any combination of three primitives:

  • Tools — actions the LLM can invoke (tools/list, tools/call)
  • Resources — data the LLM can read (resources/list, resources/read, resources/subscribe)
  • Prompts — pre-baked templates the user can invoke (prompts/list, prompts/get)

A server that only offers tools is the most common. A server that also offers resources (e.g., a file system or database server) lets the host attach context to a conversation. A server that offers prompts gives users one-click access to standardized workflows.

Beyond the basics #

A few advanced architectural features that are useful to know exist:

  • Sampling — a server can ask the client's LLM to generate text on its behalf (reverse direction). Powerful for agentic patterns.
  • Roots — the client tells the server which filesystem paths it should consider "in scope."
  • Cancellation — long-running calls can be cancelled via notifications/cancelled.
  • Progress — long-running calls can emit notifications/progress so the host can show a spinner.

You rarely need these on day one, but they are why MCP feels mature compared to ad-hoc tool integrations.

Conclusion #

MCP is conceptually small: four layers, three transports, three primitives, and a JSON-RPC 2.0 message format. The complexity comes from the interactions between them — capability negotiation, progress streaming, multi-server hosts, sampling — but the foundation is approachable.

If you understand the initialize → list → call → shutdown cycle, you understand MCP. Everything else is detail you can pick up as you need it.

Try it yourself #

After connecting your weather MCP server to Claude Desktop, here is what a discovery-and-invoke conversation looks like:

YouWhat is the weather in Chennai right now?
Claude · used get_weatherIt is currently 31°C in Chennai, partly cloudy with light winds from the southeast. Humidity is around 68% and the feels-like temperature is closer to 34°C.

What just happened underneath: the client sent initialize, the server responded with capabilities, the client called tools/list and received the get_weather schema, the LLM emitted tools/call with { city: Chennai }, and the server fetched the upstream weather API and returned a text result. Five JSON-RPC messages total.

Up next in AI & MCP

More from this topic

View all AI & MCP articles →

Enjoyed this article?

Get new AI & MCP tutorials delivered. No spam — just code-first articles when they ship.

Leave a Comment

Your email stays private. Required fields are marked *

Leave a Comment

Your email stays private. Required fields are marked *