feat: add noisebell observability

This commit is contained in:
Jet 2026-05-27 20:09:44 -07:00
parent b57927a395
commit e6c1b82679
No known key found for this signature in database
24 changed files with 2289 additions and 137 deletions

View file

@ -12,11 +12,16 @@ If the Pi stops responding to polls (configurable threshold, default 3 misses),
|--------|------|------|-------------|
| `GET` | `/status` | — | Current door status (`status`, `since`, `last_checked`) |
| `GET` | `/badge.svg` | — | Live README badge with Noisebridge logo |
| `GET` | `/metrics` | — | Prometheus metrics, scraped locally by the DO Prometheus |
| `POST` | `/webhook` | Bearer | Inbound webhook from the Pi |
| `GET` | `/health` | — | Health check |
`since` is the Pi-reported time when the current state began. `last_checked` is when the cache most recently attempted a poll.
The public Caddy vhost returns `404` for `/metrics`; Prometheus scrapes the cache directly on localhost. Metrics include the configured Pi target, poll interval, offline threshold, last poll result, last HTTP status, last poll duration, last poll attempt/success/failure timestamps, and failure counters split into HTTP, timeout, connect, request-other, and parse failures.
Regular timer-driven poll data should be debugged from Prometheus and Grafana, not by scanning logs. The cache logs sparse events instead: state changes applied from the Pi, offline/online transitions, first or changed poll failures in a failure streak, stale events, auth/rate-limit rejections, outbound webhook deliveries, retries, and final failures. Successful unchanged polls, badge/image/status reads, and metrics scrapes are intentionally quiet at `INFO`.
## Badge
`/badge.svg` serves a classic shields.io-style SVG badge with the Noisebridge logo and the current cache status (`open`, `closed`, or `offline`).