Scale your VS fleet
Chapter 3 · about 10 minutes to read
You have a working Virtual Server and you know how to monitor it (Chapter 1). The next question is operational: when do I make this VS bigger, and when do I add a second one?
Time: about 10 minutes. Prerequisites: at least one Virtual Server running, and a few hours of metrics history to look at.
The four sizing knobs
Every Virtual Server has four configuration fields you control at deploy time (from backend/app/models/virtual_server.py):
- vCPUs —
vcpus, integer, default2. How many logical cores the container can use. - Memory —
memory_mb, integer, default2048. RAM cap in megabytes. - Disk —
disk_gb, integer, default100. Persistent disk in gigabytes. - GPUs —
gpus, integer, default0, with optionalgpu_type. Set to 1+ to request GPU access on the target Compute Node.
There is no fixed "small / medium / large" enum; you compose the size from these four numbers. [VERIFY: confirm UI exposes free-form sliders vs preset T-shirt sizes for org admins at launch]
Scaling up vs scaling out
Two axes, and they answer different questions.
Scale up (vertical): make one VS bigger. Increase vcpus or memory_mb. Use this when one workload owns the VS and is CPU- or memory-bound; the workload is a single process that doesn't split well (a database, a model server, an interactive session); or you want one observable thing, not five. Trade-off: a VS can't be larger than its host device.
Scale out (horizontal): deploy more VSs, each modest, and distribute jobs across them. Use this when the workload is naturally parallel (batch jobs, queue workers, per-tenant containers); you want fault isolation; or you want to spread cost and risk across multiple Compute Nodes. Trade-off: you need a way to route work across the fleet.
Rule of thumb
Start small, scale out before you scale up. Most Zyra workloads run cheaper as a horizontal pool than as one large VS, because Compute Nodes are individual devices — you can recruit five medium devices faster than one giant one; a horizontal pool tolerates a node going offline, a single fat VS doesn't; and cost scales roughly linearly with size, so five small VSs ≈ one large VS in price but with much better availability.
Scale up only when the workload genuinely can't be split, or the inter-process chatter inside one VS is cheaper than network hops between VSs.
Distributing workload across a pool
Zyra doesn't ship an opinionated job dispatcher in MVP1 — pick the pattern that fits:
- Queue-pulling workers. Each VS pulls jobs from Redis, RabbitMQ, or your own API. Add more VSs to absorb queue depth.
- Round-robin load balancer. Put an HTTP load balancer in front of N VSs serving the same image.
- Per-tenant isolation. One VS per customer or per project.
The pattern doesn't matter to Zyra — every VS is just a persistent container.
Resizing a running VS
In MVP1, resizing is not in-place. The flow is: stop the VS (stopping → stopped), edit the size fields, start it again (starting → running). Plan a maintenance window for stateful workloads. [VERIFY: confirm whether in-place resize is targeted for MVP2 or post-MVP1]
What to watch while you scale
Open the Metrics tab on every VS in the pool and look for:
- Headroom. CPU steadily above 80% means under-provisioned. Below 20% means you can shrink.
- Memory pressure.
memory_usage_mbnear the cap means you're one bad week from OOM-kills. - Uneven distribution. One VS at 90%, four at 10% means your dispatcher is broken, not your sizing.
What just happened
You know the four size knobs, the two scaling axes, and the default-to-horizontal rule. Next chapter shows how to translate the same picture into a smaller bill.
Troubleshooting
- VS won't start after resize. The target Compute Node may not have enough free capacity. Check Compute Nodes for
available_capacity. - Pool feels uneven. Check your dispatcher — Zyra distributes containers across devices but doesn't load-balance jobs inside containers.
- GPU not detected. Set
gpus >= 1and ensure the target Compute Node has a matchinggpu_type.
Last reviewed: 2026-05-21