MCP Stateless Mode: What It Means for Load-Balanced Deployments

The current MCP specification (2025-11-25) uses a session handshake. A client sends an initialize request, the server responds with capabilities, and subsequent requests carry an Mcp-Session-Id header that binds them to that session. The server uses this ID to route requests to the right handler and to hold any server-managed state.

The 2026-07-28 RC removes all of this. No initialize handshake. No Mcp-Session-Id. Every request is self-contained. This is a significant protocol change, and operators who are not testing stateless mode now will be caught unprepared when the RC lands as final.

Why sessions exist in the first place

The session handshake serves several purposes. It lets the server advertise capabilities once and have the client remember them. It lets the server maintain per-client state — useful for servers that do multi-step operations or track subscriptions. And it enables server-to-client push: once a session is established over a persistent connection, the server can send messages the client did not request, which is how sampling and elicitation work.

For a single-server deployment, sessions are mostly fine. The session ID routes to the right in-process handler and everything stays on one machine. The problem appears when you scale out.

The horizontal scaling problem with sessions

In a load-balanced deployment, multiple gateway instances sit behind a load balancer. When a client establishes a session on instance A, subsequent requests carrying that session ID must reach instance A — or instance A must share session state with instances B and C via a shared store. This is the sticky session problem.

Sticky sessions work but they create operational constraints. An instance crash drops all its sessions. Rolling deploys become careful. The load balancer needs session-aware routing. You trade statelessness for capability.

The 2026-07-28 RC sidesteps this entirely by making all requests stateless. No session ID means no routing constraint. Any instance can handle any request. Standard load balancing works without modification.

What you give up in stateless mode

The capability loss is real and worth stating clearly: server-to-client push does not work in stateless mode.

In stateful mode, the server can send unsolicited messages to the client over the established session connection. This is the mechanism that underlies sampling (the server asking the client/model to generate a completion) and elicitation (the server asking the client for additional user input). Both require a live bidirectional channel tied to a session.

In stateless mode, each request arrives on its own connection. There is no persistent channel for the server to use. If a tool implementation needs to request a model completion mid-execution, or needs to prompt the user for clarification, stateless mode cannot support that.

For most tool workloads — single-turn, request-response — this is not a limitation. For more sophisticated agentic patterns that depend on sampling or elicitation, you need stateful mode.

How ToolHost handles both modes

ToolHost supports both session modes, switchable per deployment. The configuration is straightforward:

{
  "session": {
    "mode": "stateless"
  }
}

In stateless mode, the gateway does not require or validate Mcp-Session-Id. Each request is handled independently. The gateway assigns an internal request ID for tracing purposes, but this is not a session — it does not persist across requests and carries no routing semantics.

One operational detail: per-session rate limit budgets (calls per session) do not apply in stateless mode because there is no session to accumulate against. Each request starts fresh. If you rely on session-scoped rate limits, you will need to switch to per-minute or per-hour limits when operating stateless.

In stateful mode (the current default), the gateway manages Mcp-Session-Id assignment, routes subsequent requests by session ID, and maintains the session state needed for push messaging. This is the mode to use if your tools use sampling or elicitation.

What operators should do now

The 2026-07-28 RC is not final today, but the direction is clear. The protocol is moving to stateless as the default. Operators who are building on session-dependent features need to understand whether those features have a stateless equivalent or whether they will require architectural changes.

Concretely: test stateless mode now. Switch your staging deployment to "mode": "stateless" and run your tool workloads against it. If something breaks, you want to know today. If everything works, you are ready for the RC.

The session abstraction in ToolHost is designed as a swap, not a rewrite. Changing the mode setting does not require touching tool definitions, backend configurations, or policy rules. The protocol-layer change stays at the protocol layer.