feat: add noisebell observability

This commit is contained in:
Jet 2026-05-27 20:09:44 -07:00
parent b57927a395
commit e6c1b82679
No known key found for this signature in database
24 changed files with 2289 additions and 137 deletions

View file

@ -124,8 +124,11 @@ That script:
6. writes `/etc/noisebell/noisebell.env`
7. writes `/etc/noisebell/noisebell-relay.env`
8. installs `noisebell.service` and `noisebell-relay.service`
9. enables and starts both services
10. runs `tailscale up` with the decrypted auth key
9. enables persistent journald with a 30 day retention target
10. installs and enables `prometheus-node-exporter`
11. installs `noisebell-loki-journal.service` to ship Pi logs to Loki on `noisebell-do`
12. enables and starts the Noisebell services
13. runs `tailscale up` with the decrypted auth key
## Files written on the Pi
@ -143,6 +146,9 @@ The deploy script creates:
- `/etc/noisebell/noisebell-relay.env`
- `/etc/systemd/system/noisebell.service`
- `/etc/systemd/system/noisebell-relay.service`
- `/etc/systemd/system/noisebell-loki-journal.service`
- `/usr/local/bin/noisebell-loki-journal`
- `/etc/systemd/journald.conf.d/noisebell-persistent.conf`
All secret files are root-only.
@ -275,10 +281,18 @@ Important: Home Assistant webhook IDs are exact. If the automation shows a leadi
## API
All endpoints require `Authorization: Bearer <token>`.
`GET /` requires `Authorization: Bearer <token>`.
**`GET /`**
```json
{"status": "open", "timestamp": 1710000000}
```
**`GET /metrics`**
Prometheus metrics for local door state, raw GPIO level, debounced state-change counters, webhook delivery counters, last webhook result/status/duration, boot identity, uptime, temperature, throttling flags, Wi-Fi signal, and Tailscale state. This endpoint is unauthenticated and intended for Tailscale-only scraping by the DO Prometheus.
`noisebell-relay` also exposes unauthenticated Prometheus metrics at `GET /metrics` on port `8090`, including inbound webhook count, Home Assistant forwarding counters, and last forward result/status/duration.
Routine sampled values belong in Prometheus, not logs: GPIO level, Wi-Fi signal, temperature, uptime, Tailscale state, scrape health, and webhook counters are graphed from `/metrics`. Journald/Loki logs are intended to stay event-oriented: startup/shutdown, initial state sync, debounced door state changes, successful state deliveries, delivery retries/failures, unauthorized requests, relay forwards, and GPIO read error/recovery events.