Skip to main content

Scaling

This guide explains how to scale Virtual MCP Server (vMCP) deployments.

Vertical scaling

Vertical scaling (increasing CPU/memory per instance) is the simplest approach and works for all use cases, including stateful backends.

To increase resources, configure podTemplateSpec in your VirtualMCPServer:

spec:
podTemplateSpec:
spec:
containers:
- name: vmcp
resources:
requests:
cpu: '500m'
memory: 512Mi
limits:
cpu: '1'
memory: 1Gi

Vertical scaling is recommended as the starting point for most deployments.

Horizontal scaling

Horizontal scaling (adding more replicas) can improve availability and handle higher request volumes.

How to scale horizontally

The VirtualMCPServer CRD does not have a replicas field. The operator creates a Deployment named vmcp-<NAME> (where <NAME> is your VirtualMCPServer name) with 1 replica and preserves the replicas count, allowing you to manage scaling separately.

Option 1: Manual scaling

kubectl scale deployment vmcp-<vmcp-name> -n <NAMESPACE> --replicas=3

Option 2: Autoscaling with HPA

kubectl autoscale deployment vmcp-<vmcp-name> -n <NAMESPACE> \
--min=2 --max=5 --cpu-percent=70

When horizontal scaling is challenging

Horizontal scaling works well for stateless backends (fetch, search, read-only operations) where sessions can be resumed on any instance.

However, stateful backends make horizontal scaling difficult:

  • Stateful backends (Playwright browser sessions, database connections, file system operations) require requests to be routed to the same vMCP instance that established the session
  • Requires session affinity configuration (which may not work reliably through proxies)
  • Session resumption may not work reliably for stateful backends

For stateful backends, vertical scaling or dedicated vMCP instances per team/use case are recommended instead of horizontal scaling.