Documentation › User Guides › Organizations › Stage 3 › Monitor your Virtual Servers

Monitor your Virtual Servers

Chapter 1 · about 10 minutes to read

Stage 2 ended with one Virtual Server running and one job completed. Stage 3 starts with the question every operator asks next: "is it actually healthy, and how do I know?" This chapter is the tour of every metric, log surface, and status signal Zyra exposes for a running VS.

Time: about 10 minutes to read, plus a few minutes clicking around your own dashboard. Prerequisites: at least one Virtual Server in running state.

The three places monitoring data lives

Zyra stores monitoring data in three layered surfaces. Open the dashboard, click any VS row, and you'll see all three:

Live snapshot — the four columns on the VS detail header: cpu_usage_percent, memory_usage_mb, disk_usage_gb, uptime_seconds. Refreshed by the agent every 10 seconds and written straight onto the virtual_servers row.
Time-series history — the virtual_server_metrics table. The agent appends a row every minute with CPU, memory, network rx/tx bytes, and block read/write bytes. Retained for 90 days per the database retention policy.
Logs — the virtual_server_logs table. Stdout, stderr, and lifecycle events from the container. Retained 30 days for info, 90 days for errors.

The "Overview" tab — what to look at first

Open Virtual Servers, click your VS, land on Overview. Five panels:

Status pill. One of creating / starting / running / stopping / stopped / restarting / terminating / terminated / error per VirtualServerStatus. Green = running. Yellow = a transition state. Red = error.
Uptime. Wall-clock seconds since started_at, formatted as "3d 4h 12m".
CPU graph. Last 60 minutes of cpu_usage_percent from virtual_server_metrics.
Memory graph. Last 60 minutes of memory_usage_mb plotted against the configured memory_mb cap from the VS spec.
Network throughput. Last 60 minutes of bytes-per-second derived from network_rx_bytes / network_tx_bytes deltas.

[SCREENSHOT: VS detail page Overview tab showing the five panels above]

The "Metrics" tab — historical trends

Click Metrics. Same data, longer windows: 1h / 6h / 24h / 7d / 30d. Useful for spotting:

Steady-state drift. CPU was 20% last week, 60% this week, same workload — something is leaking.
Periodic spikes. Memory climbs every hour and resets — a misbehaving cron or scheduled job inside the container.
Network surprises. Outbound bytes climbing for no reason — exfil risk or runaway client.

Each metric chart has a "compare to threshold" toggle that overlays any alert rule you've configured for that metric (see Chapter 2: SLA alerts).

The "Logs" tab — what your container is saying

Click Logs. Streams the virtual_server_logs table for this VS, newest first. Filters:

Level: info / warning / error.
Source: container_stdout / container_stderr / lifecycle / agent.
Time window: last hour, last day, last week, or a custom range.

Click any line to expand its full payload (the details JSONB column). Lifecycle events — pulling_image, started, restarted, oom_killed — surface here first, often before they reach the status pill.

[SCREENSHOT: Logs tab with a filter dropdown and an expanded error line]

The fleet-wide view

Per-VS detail pages are great for one server. When you have a dozen, open Monitoring in the sidebar (the org-wide alerting page wired to /api/v1/monitoring/alerts). It shows:

A heatmap of every VS in your org by CPU and memory load.
A list of triggered alerts you haven't acknowledged.
A list of currently breaching SLAs (see Chapter 2).

Where Zyra itself monitors your fleet

Beyond what you see in the UI, Zyra runs Stage 11 continuous monitoring in the background:

An external health probe hits every public URL hourly.
The Observability Engineer publishes a daily health report at roughly 09:00 UTC summarising your VSs alongside platform health.
Brute-force login anomaly detection writes tenant-scoped security alerts. CPU / memory anomaly detection remains evidence-tracked roadmap work, not a customer-facing guarantee today.

You do not have to configure brute-force login detection; it runs on the auth flow by default. Broader resource-anomaly UI exposure is tracked separately and should not be treated as launched.

What just happened

You know the four surfaces where Zyra exposes VS health: live snapshot on the row, time-series in the Metrics tab, log stream in the Logs tab, and fleet-wide rollup in the Monitoring page. Next chapter turns these signals into alerts that wake someone up.

Troubleshooting

Charts say "no data". The agent hasn't reported metrics yet. Wait 90 seconds after running status; if still empty, check Logs → Source: agent.
Uptime resets to 0. The container restarted. Look at Logs → Source: lifecycle for an oom_killed or exit_code != 0 event.
Status flips to error. Hover the pill for the error_message field. Common values: image-pull failure, port already in use, capability cap exceeded.

Last reviewed: 2026-05-21