Documentation › User Guides › Organizations › Stage 5 › Capacity planning

Capacity planning across a real device fleet

Chapter 2 · about 12 minutes to read

Capacity planning on Zyra is unusual: you do not buy reserved instances or pre-warm pools. You enrol Compute Nodes (or share an org's pool), and the placement engine decides where each Virtual Server lands. Your job is to make sure there are enough candidate devices, in the right shape, when the demand hits.

Time: about 12 minutes. Prerequisites: at least 30 days of running Virtual Servers so the historical metrics are meaningful.

What "capacity" means on Zyra

A Compute Node has a capability_score (0-1000, from backend/app/models/device/model.py). The placement engine (backend/app/services/placement/engine.py) scores each candidate device on hardware fit, current load, reliability, cost, and network. When you POST /api/v1/virtual-servers, the engine picks the winner from candidates that satisfy your hard constraints.

Capacity is therefore not a single number. It is a distribution: how many devices in my fleet, with what capability_score, with what current load, in what zone, can satisfy a request shaped like X?

Step 1 — Pull your historical demand

Open Monitoring → Fleet metrics and export the last 90 days:

VS create rate per hour, per day-of-week.
Peak concurrent running VSs.
VS sizes (vcpus / memory_mb / gpu) distribution.
Mean and p95 run duration.

If the dashboard does not yet expose all of these, the same data is in the metrics API. [VERIFY: fleet-wide aggregation endpoints — partially in /api/v1/monitoring; full per-org fleet rollup is on the backlog]

Sketch the demand curve: x-axis = hour of week, y-axis = concurrent running VSs. Peaks tell you the headroom you need.

Step 2 — Inventory your supply

Run GET /api/v1/devices?status=online for your org. Bucket the result by capability_score:

0-300 (small). Modest CPU, no GPU, < 8 GB RAM. Good for light services, sidecars, fan-out shards.
300-700 (mid). Solid 4-8 vcpu CPUs, 16-32 GB RAM. Workhorse band.
700-1000 (heavy). High core count, often with a GPU. Reserve for hard jobs.

A common mistake: "I have 80 devices online" does not mean you can run 80 demanding VSs. Only the heavy band can. Read the histogram, not the count.

Step 3 — Set headroom

Compare the demand peak to the supply on the same size class:

2× peak — always two slots free per slot used. SLA-bound production; queues unacceptable.
1.3× peak — modest cushion. Steady production, brief queues tolerated.
1.0× peak — no cushion. Batch / non-SLA workloads happy to wait.

Over-provisioning costs you nothing if you are sourcing Compute Nodes from existing org hardware (the MVP1 model). It costs measurably if you run dedicated infrastructure. The right answer depends on which world your fleet lives in.

Step 4 — Geography and latency

The placement engine's network scorer prefers devices closer to the requesting user / region. For latency-sensitive work, this matters more than raw capability_score.

For each major user region, ensure ≥ 3 enrolled devices.
Use device pools or zone placement constraints (backend/app/models/placement_constraint.py) to pin workloads to a region when latency matters.
The exact device location surface on the dashboard is [VERIFY: location_country / location_region columns exist in the model but are not yet exposed in the org-facing UI for MVP1].

Latency is not a thing you fix at scheduling time — it is a thing you fix at enrolment time.

Step 5 — Read the capability_score distribution

A healthy fleet looks like a roughly bell-shaped histogram with a long heavy-band tail. Pathological shapes:

All devices in one band. Your fleet is monoculture. A GPU job has nowhere to go.
A long left tail of 0-100 scores. Those devices are essentially unusable — remove them.
Heavy band empty. GPU work will fail placement. Enrol GPU-equipped devices or accept that you will queue heavy jobs.

The score is computed server-side from the agent's hardware report. Scores drift up or down with reliability (see device_reputation).

Step 6 — Plan for surge

A 10× workload day is the failure mode capacity planning exists for. Playbook:

One week out. Enrol additional devices. They need time to prove reliability before the engine prefers them.
48 hours out. Pre-warm: spin up a couple of test VSs across the fleet to confirm placement and pull images.
Day of surge. Watch Monitoring → Placement failures and the auto-scaling engine's behaviour. If failures climb, you under-provisioned.
Post-surge. Decommission the additional devices or leave them enrolled if the surge is the new baseline.

A worked example — 10× workload day

Baseline: 200 small VSs and 30 heavy VSs concurrently, fleet of 120 devices (60/40/20 by band). Target: 10× = 2000 small + 300 heavy.

Small band: 60 devices, each can host ~5 small VSs concurrently → 300 slots. Need 2000 → enrol 340 more small-band devices. Or shard the workload so it tolerates a queue.
Heavy band: 20 devices, ~3 heavy VSs each → 60 slots. Need 300 → enrol 80 more heavy devices, or use a pipeline (Chapter 1) to time-share GPU minutes.

Numbers like these are why capacity planning gets done before the surge, not during.

What just happened

You have a method for sizing the fleet against historical demand, a way to read the supply-side distribution, and a surge playbook. The next chapter turns the same data into a cost-reduction checklist.

Troubleshooting

Placement failures spiking. Either no candidate device matches the shape, or all candidates are over their load thresholds. Read the placement decision log; raise headroom for that size band.
VSs land on the same device repeatedly. Likely because the placement engine's hardware score dominates. Add anti-affinity constraints if undesirable.
GPU jobs always queue. Heavy-band fleet is too thin. Enrol GPU-equipped devices or split GPU work via pipelines.

Last reviewed: 2026-05-21