Documentation › User Guides › Organizations › Stage 3 › Scale your VS fleet

Scale your VS fleet

Chapter 3 · about 10 minutes to read

You have a working Virtual Server and you know how to monitor it (Chapter 1). The next question is operational: when do I make this VS bigger, and when do I add a second one?

Time: about 10 minutes. Prerequisites: at least one Virtual Server running, and a few hours of metrics history to look at.

The four sizing knobs

Every Virtual Server has four configuration fields you control at deploy time (from backend/app/models/virtual_server.py):

vCPUs — vcpus, integer, default 2. How many logical cores the container can use.
Memory — memory_mb, integer, default 2048. RAM cap in megabytes.
Disk — disk_gb, integer, default 100. Persistent disk in gigabytes.
GPUs — gpus, integer, default 0, with optional gpu_type. Set to 1+ to request GPU access on the target Compute Node.

There is no fixed "small / medium / large" enum; you compose the size from these four numbers. [VERIFY: confirm UI exposes free-form sliders vs preset T-shirt sizes for org admins at launch]

Scaling up vs scaling out

Two axes, and they answer different questions.

Scale up (vertical): make one VS bigger. Increase vcpus or memory_mb. Use this when one workload owns the VS and is CPU- or memory-bound; the workload is a single process that doesn't split well (a database, a model server, an interactive session); or you want one observable thing, not five. Trade-off: a VS can't be larger than its host device.

Scale out (horizontal): deploy more VSs, each modest, and distribute jobs across them. Use this when the workload is naturally parallel (batch jobs, queue workers, per-tenant containers); you want fault isolation; or you want to spread cost and risk across multiple Compute Nodes. Trade-off: you need a way to route work across the fleet.

[SCREENSHOT: Side-by-side dashboard view of one large VS vs five small VSs handling the same load]

Rule of thumb

Start small, scale out before you scale up. Most Zyra workloads run cheaper as a horizontal pool than as one large VS, because Compute Nodes are individual devices — you can recruit five medium devices faster than one giant one; a horizontal pool tolerates a node going offline, a single fat VS doesn't; and cost scales roughly linearly with size, so five small VSs ≈ one large VS in price but with much better availability.

Scale up only when the workload genuinely can't be split, or the inter-process chatter inside one VS is cheaper than network hops between VSs.

Distributing workload across a pool

Zyra doesn't ship an opinionated job dispatcher in MVP1 — pick the pattern that fits:

Queue-pulling workers. Each VS pulls jobs from Redis, RabbitMQ, or your own API. Add more VSs to absorb queue depth.
Round-robin load balancer. Put an HTTP load balancer in front of N VSs serving the same image.
Per-tenant isolation. One VS per customer or per project.

The pattern doesn't matter to Zyra — every VS is just a persistent container.

Resizing a running VS

In MVP1, resizing is not in-place. The flow is: stop the VS (stopping → stopped), edit the size fields, start it again (starting → running). Plan a maintenance window for stateful workloads. [VERIFY: confirm whether in-place resize is targeted for MVP2 or post-MVP1]

What to watch while you scale

Open the Metrics tab on every VS in the pool and look for:

Headroom. CPU steadily above 80% means under-provisioned. Below 20% means you can shrink.
Memory pressure. memory_usage_mb near the cap means you're one bad week from OOM-kills.
Uneven distribution. One VS at 90%, four at 10% means your dispatcher is broken, not your sizing.

What just happened

You know the four size knobs, the two scaling axes, and the default-to-horizontal rule. Next chapter shows how to translate the same picture into a smaller bill.

Troubleshooting

VS won't start after resize. The target Compute Node may not have enough free capacity. Check Compute Nodes for available_capacity.
Pool feels uneven. Check your dispatcher — Zyra distributes containers across devices but doesn't load-balance jobs inside containers.
GPU not detected. Set gpus >= 1 and ensure the target Compute Node has a matching gpu_type.

Last reviewed: 2026-05-21