feat: add noisebell observability

This commit is contained in:
Jet 2026-05-27 20:09:44 -07:00
parent b57927a395
commit e6c1b82679
No known key found for this signature in database
24 changed files with 2289 additions and 137 deletions

View file

@ -34,3 +34,15 @@ Useful commands:
- `scripts/deploy-pios-pi.sh pi@100.66.45.36` redeploys the Raspberry Pi OS machine
The full Home Assistant relay workflow is documented in `pi/README.md`.
## Observability
The DigitalOcean host runs Prometheus, Loki, Grafana, Alloy, node_exporter, and blackbox_exporter via `hosts/noisebell-do/observability.nix`. Grafana provisions the `Noisebell DO + Pi` dashboard from code, with Prometheus panels for both hosts, detailed DO-to-Pi poll health, and Loki journal panels for both hosts.
- Grafana: `http://noisebell-do:3030/` over Tailscale
- Prometheus: `http://noisebell-do:9090/` over Tailscale
- Loki: `http://noisebell-do:3100/` over Tailscale
The Pi deploy script enables persistent journald, installs `prometheus-node-exporter`, and installs `noisebell-loki-journal.service` to ship Pi journal logs to Loki on the DO host.
Prometheus is the source of truth for regular time-based data: scrape health, host CPU/memory/disk/uptime, DO-to-Pi poll counts and last results, GPIO state, Pi hardware readings, webhook counters, and retry counters. Loki/journald is reserved for sparse event logs that should be readable in chronological order: service start/stop, door state changes, cache state changes, Pi offline/online transitions, auth or rate-limit rejections, webhook retries/failures, stale events, and GPIO read errors. Routine successful polls, unchanged poll results, metrics scrapes, and badge/image/status reads are intentionally not logged at `INFO`.