walwarden
GuidesOperate

Read your recovery posture

A card-by-card guide to the logged-in dashboard — what each tile means, what good vs bad looks like, and where each action leads.

This guide: read the dashboard the way an operator on call would. It walks every card on the logged-in surface — the Recovery posture hero, the activity strip, and the Recent jobs log — and tells you what a healthy value looks like, what a value that needs attention looks like, and where each button takes you.

You arrive here once you have at least one database connected. Before that, the dashboard shows an onboarding checklist instead; see Getting started.

The page at a glance

The active dashboard stacks three regions, top to bottom:

  1. Recovery posture — the hero. Your headline RPO, the last signed manifest, a restore-time estimate, the most recent drill outcome, and the two recovery actions.
  2. Activity strip — three compact cards: open incidents, next scheduled events, retention compliance.
  3. Recent jobs log — the unified timeline of snapshots, verifications, restore drills, and restores.

Everything on the page is derived from data walwarden already holds. None of these numbers are estimates of the future; they are statements about what has actually happened and what is signed.

Recovery posture (the hero)

The hero answers one question: if the source database failed right now, how much would I lose and how fast could I get back?

RPO tile

The largest number on the page. RPO (Recovery Point Objective) is your loss window — the age of your most recent recoverable backup, rendered as elapsed time:

Loss windowRenders as
Under an hour12m 00s (sub-minute resolution kept for incident reading)
An hour to a day8h 36m
A day or more1d 4h
No successful backup yet

A loss window of 8h 36m means your last completed backup finished eight hours and thirty-six minutes ago, so a failure now would lose at most the writes since then. For the full meaning of RPO here — and why it is a backup-recency figure, not a continuous-protection guarantee — see Recoverability and RPO.

Green vs amber. The tile colours itself against the interval walwarden derives from your backup schedule (your RPO target — derived from the cron you set, never a fixed number):

ColourMeaningExample
GreenThe loss window is inside your schedule's expected interval. This is the healthy state.A database on a daily schedule reading 8h 36m
AmberThe loss window has drifted past your interval — a backup is overdue or recently failed. Check the jobs log and your schedule.An hourly database reading 3h 10m
(neutral)No successful backup has landed yet. Run pre-flight and take a first backup.A newly connected database

There is no red on the RPO number itself — a genuine failure surfaces as a red Open incidents count and a Failed row in the jobs log (below), which is where you act.

Manifest hash tile

The short hash (ec664a…450c) of the last Ed25519-signed manifest, with a caption like signed 4 minutes ago by walwarden-worker. This is the proof that the backup artifact exists, is signed, and is verifiable offline. A hash here means the audit chain recorded a signed artifact; means no signed manifest yet. You can verify any manifest offline — see Produce an evidence bundle.

Restore estimate tile

≈ 4m 12s is a derived estimate of how long a restore would take, based on recent backup durations as a proxy. It is honestly an estimate, never a guarantee. When no estimate exists the tile shows with the caption drill to learn — the way to produce a real number is to run a drill.

Drill status line

Across the top of the hero: Restore drill status: Passed 7m ago — no diff, with a chip (Passed, Failed, Pending, or no drill yet). A backup you have never restored from is unproven; the drill line is where the most recent proof — or its absence — lives. Restore drill status: — means no drill has run.

Recovery actions

Two buttons sit in the Recovery actions panel, with the restore target named above them (Target: your-database):

ButtonWhat it doesWhere it leads
Restore now (Recover this database)The accent action — restores your most recent backup to a target you control. Use this mid-incident.Run a restore drill walks the same one-liner flow; pick a mode in Restore modes.
Run restore drill (Test recoverability — safe, no impact)Restores to a target you control to confirm recoverability. Safe to run any time; it does not touch your source database.Run a restore drill

If no backup is available to restore from, both buttons disable and the panel reads No backup to restore yet.

Activity strip

Three cards beneath the hero. Each shows a count and a one-word status chip.

Open incidents

The count of failed snapshots plus failed restore drills in the last 24 hours.

  • 0 with all systems verified (green, chip all clear) is the healthy state.
  • A non-zero count is red (chip needs attention) with a breakdown such as 2 failed snapshots · 1 failed drill. Click into the named database row in the jobs log to see the failure detail.

Next scheduled events

Counts down to what runs next: Next backup in 56m and Next restore drill in 3h 14m. If nothing is scheduled, the line shows with a Schedule one affordance. Configure cadence in Scheduled backups.

Retention compliance

30/30 days met — how many of your target retention days currently have a recoverable backup:

ChipMeaning
on policy (green)Every target day is covered.
partial (amber)Fewer days met than targeted, no violation in the last 24h yet.
behind retention (amber)A retention violation occurred in the last 24 hours.
no policy set (neutral)No retention target configured.

Retention follows from your backup schedule.

Recent jobs log

The unified timeline. Every snapshot, verification, restore drill, and restore appears as one row, newest first. Columns: Status, Time, Type, Database, Destination, Duration, Bytes, Manifest Hash.

Status chips

StatusTone
CompletedGreen
Running / PendingAmber (in flight)
FailedRed — click the database name to open the failure detail

What the blanks and mean

This is the most common point of confusion, because two different absences look similar:

  • A blank cell (Destination, Duration, or Bytes on a non-snapshot row) means the metric is structurally not-applicable. A verification, restore drill, or restore does not write bytes to S3, so it has no destination, duration-of-write, or byte count. The cell is intentionally empty — hover it for the reason. A blank here is not a gap in the proof.
  • means expected but absent. On a snapshot row, a under Duration, Bytes, or Manifest Hash means that value was expected and is genuinely missing — worth a look. A in the Manifest Hash column means no signed manifest was recorded for that row.

In short: blank = not-applicable, = expected-but-missing. Verification rows are also rendered in a muted tone because they are routine paired checks folded beneath the snapshot they verify.

Reading the log during an incident

Start at the top. A green Completed snapshot with a manifest hash is a clean run. A red Failed row is your incident — open the database to see why. A run of Verification rows beneath a snapshot is the audit chain doing its job. The Time column carries a local-timezone timestamp, so you never do UTC math while the world is on fire.

Getting alerted instead of watching

You should not have to keep this page open. Wire notification routes so failed backups, failed restore drills, audit-chain anomalies, and a stopped worker reach Slack, Discord, email, or a signed webhook — then the dashboard becomes the place you confirm, not the place you watch.

The honest boundary

These cards report on scheduled logical backups and operator-initiated restores. They are not a continuous-protection or unattended-restore claim. For exactly what the product does and does not do today, see What is not shipped and the honest capability claims reference.