# PoM (Peace of Mind) -- Audit History Full chronological audit log. See [audit_review.md](./audit_review.md) for current state. ## Changes Since Last Audit ### Tenth audit (2026-03-28, Run 12 cross-project) - **Test count:** 359 (222 unit + 8 cli + 129 integration). 0 clippy warnings. 0 failures. - **Grade:** A (maintained). v0.3.2. - **CORS monitoring:** New check type added for monitoring CORS headers on targets. - **New dependency advisories (action items):** - aws-lc-sys 0.38.0 (RUSTSEC-2026-0044 + -0048, severity 7.4 HIGH) — upgrade to 0.39.0 via `cargo update -p aws-lc-sys` - rustls-webpki 0.103.9 (RUSTSEC-2026-0049) — upgrade to 0.103.10 via `cargo update -p rustls-webpki` - paste unmaintained (RUSTSEC-2024-0436) — upstream via rmcp, warning only - **Mandatory surprise:** None. Previous surprises (rate limiter relaxed ordering, write!().unwrap() infallibility) still valid. - **No new code findings.** All previous items remain resolved. ### DNS/Route stale data fix (2026-03-25) - **Test count:** 352 (unchanged). 0 clippy warnings. - **Config:** Switched all 4 Cloudflare-proxied DNS records from `expected = ["IP"]` to `expected = []` (resolution-only). DNS checks were always failing because Cloudflare returns rotating proxy IPs, not the origin IP. - **API filtering:** `route_status` and `dns_status` in `/api/status/{target}` now filtered to only entries matching current config. Stale routes (e.g. `/docs/about`, `/signup`) and stale DNS records no longer appear in API responses. - **DB pruning:** Added `prune_stale_routes()` and `prune_stale_dns()` to `db.rs`. Called once at task startup in `routes.rs` and `dns.rs` to clean up historical data when config changes. Pruned 890 stale route check rows on first deploy. - **Integration tests:** Updated `api_status_includes_route_status` and `api_status_includes_dns_status` to use configs with matching route/DNS entries. - **Deployed to hetzner** — v0.3.2 binary + updated config. ### Eighth audit (2026-03-18, Run 9 cross-project) - **Test count:** 344 (unchanged). 0 clippy warnings. - **Grade:** A (maintained). v0.3.1 (deployed 2026-03-18). - **Dashboard UI shipped.** Per-test tracking, regression detection, duration drift. - **cli/ directory module split** completed (1,035-line cli.rs -> 8 files). - **Observations (pre-existing, not regressions):** - Mutex `.unwrap()` in rate limiter (api.rs:41) — if thread panics while holding lock, subsequent calls panic. Impact: LOW (rate limiter only, not core logic). Design choice: acceptable for monitoring tool. - `serde_json::to_value(d).unwrap_or_default()` in API details field — silently becomes null on serialization failure. Impact: LOW, safe fallback. - **No new findings requiring action.** Grade maintained at A. - **Mandatory surprise:** Rate limiter uses `fetch_add` with Relaxed ordering — can allow up to max_per_window+1 requests due to check-then-increment race. Known trade-off of lock-free rate limiting, documented. ### Fifth audit (2026-03-16, Run 6 cross-project) - **Test count:** 238 -> 344 (220 unit + 124 integration, +106 tests) - **Grade:** A (maintained). No new findings above LOW. - **Source LOC:** 10,113 (up from ~3.5K) - **Clippy:** 2 warnings (collapsible_if in cli.rs — LOW) - **Production unwraps:** 76 total — 64 infallible write! on String, 12 safe-by-construction. Effectively zero risky unwraps. - **Mandatory surprise:** write!().unwrap() pattern provably infallible — Actually fine. - **Previous items verified:** All previous remediated items confirmed intact. - **Note:** cli.rs at 1,036 lines — approaching the 500-line branching guideline but mostly flat match arms. - **Infrastructure check:** Blocked by Tailscale SSH re-authentication. Deferred. ### Fourth audit remediation (2026-03-14) - **Grade:** A- -> A. All remaining findings resolved. - **Test count:** 229 -> 238 (+9 integration tests) - **Graceful shutdown:** Replaced `handle.abort()` with CancellationToken + `tokio::select!` in all task loops. API server uses `with_graceful_shutdown`. 5s grace period on SIGINT/SIGTERM. - **Task panic detection:** 60s watchdog checks `JoinHandle::is_finished()` on all background tasks. - **Rate limiting:** Fixed-window 60 req/min middleware on authenticated API routes. Custom `RateLimiter` struct. - **Self-monitoring:** `GET /api/health` endpoint (public, no auth) returns `{"status":"operational","version":"..."}`. - **Integration tests:** 5 check_health tests (mock axum servers: operational, degraded, unreachable, expectations pass/fail), 1 check_tls test (self-signed cert via rcgen), 2 /api/health tests, 1 rate limiter test. - **Deploy config cleanup:** Removed redundant htpy `expected_routes` (duplicated health check URL). - **Dependency:** Added `tokio-util` for CancellationToken. - **Cold spots:** 0 remaining (was 3). All previous architectural and testing gaps closed. ### Third audit (2026-03-13, pre-launch skeptical lens) - **Grade:** A -> A-. Postmark API token in plaintext deployment configs is a real issue. - **Test count:** 56 -> 187 (+131 tests) - **New findings:** Plaintext API token, no API auth, no peer mesh auth, no integration tests for core functions, no self-monitoring. - **38 unwraps in non-test code** — all verified safe (write to String or guarded by prior checks). **Post-audit remediation (2026-03-13):** - All 3 critical/medium findings resolved: Postmark token to env var, API bearer auth (5 tests), peer mesh auth - 2 low findings resolved: SSH filter validation, peer UUID mismatch rejection - Test count: 187 -> 195 (+8 tests) - Documentation upgraded to A: All struct fields documented (HealthSnapshot, HealthStatus, HealthDetails, TestRun, TestStaleness, PeerStatus, OnMissing, all config types, all API response types). All 8 error variants documented. 11 config defaults with rationale comments. prune_old_records return tuple documented. description.md rewritten, architecture.md created (191 lines), README created (62 lines). ### Observability Upgrade (2026-03-13) - **Observability:** A- -> A - Added 57 `#[instrument(skip_all)]` annotations across 9 files: db.rs (28), alerts.rs (9), tools/mod.rs (8), tools/health.rs (5), tools/tests.rs (3), checks/http.rs (1), checks/tls.rs (1), checks/ssh.rs (1), peer.rs (1) - Added Multithreaded forum as monitoring target: `pom-astra.toml` (localhost:3400), `pom-hetzner.toml` (Tailscale IP) - Added test runner targets for GO, BB, AF, SK to `pom-astra.toml` - All 208 tests pass. `cargo check` passes clean. ### Adversarial Test Audit (2026-03-13) **Goal:** Write tests that try to break the system. Find edge cases, race conditions, boundary conditions, and logic errors. **Results:** - **Test count:** 195 -> 208 (+13 tests) - **CRITICAL fix:** Alert cooldown key mismatch — `record_alert` used `target` but lookup used `alert_key` (`"health:{target}"`), so cooldowns never matched and alerts fired every check. Fixed by using `alert_key` consistently. - **HIGH fix:** TLS expiry check inconsistent at day boundary — time-of-day comparison could cause flapping. Changed to `date_naive()` comparison for stable day-level logic. - **HIGH fix:** UUID mismatch left stale peer state — now resets state, clears failures, persists via `update_peer_identity()` to prevent showing stale data after peer identity change. - **HIGH fix:** `prune_old_records` no guard for days <= 0 — could delete all records. Added early return for `days <= 0` (no-op). - **HIGH fix:** SSH timeout ignored config value — hardcoded `ConnectTimeout=10` in SSH args. Changed to use `config.timeout_secs`. - **Added `rcgen` dev dependency** for TLS cert generation in tests. ### Second audit (2026-03-11) | Change | Detail | |--------|--------| | Tests | +39 tests (17 -> 56). 28 unit + 28 integration. Tests/KLOC: 5.8 -> 18.4. | | Lock contention | Addressed in both peer.rs (heartbeat handlers) and api.rs (status/mesh handlers). Data collected under lock, DB writes after release. | | DB indexes | 4 indexes added: health_checks(target, id DESC), health_checks(target, checked_at), test_runs(target, id DESC), peer_heartbeats(peer_name, id DESC). | | Clippy | 4 warnings -> 0. Used Rust 2024 let chains instead of nested if-let. | | Type safety | PeerConfig.on_missing changed from String to OnMissing enum with serde deserialization. | | Module docs | Added //! docs to db.rs, config.rs, peer.rs, types.rs, lib.rs. | | Error handling | /api/peer/status fetch failures now logged at debug level instead of silenced. | | Prune | prune_old_records now returns 3-tuple including peer heartbeat count. | | Code extraction | HealthStatus::icon() method eliminates 3 repeated match blocks. | | HTTP checks | Response classification extracted into pure functions for testability. | ## Metrics Over Time | Audit Date | LOC | Rust Files | Tests | Tests/KLOC | Clippy Warnings | Cold Spots | Overall | |------------|-----|-----------|-------|-----------|----------------|------------|---------| | 2026-03-10 | 2,934 | 15 | 17 | 5.8 | 4 | 8 | B+ | | 2026-03-11 | 3,039 | 14 | 56 | 18.4 | 0 | 3 | A | | 2026-03-13 | ~3K | ~14 | 208 | ~69 | 0 | 3 | A- | | 2026-03-14 | ~3.5K | ~16 | 238 | ~68 | 0 | 0 | A | | 2026-03-16 | 10.1K | 23 | 344 | ~34 | 2 | 0 | A | | 2026-03-18 | 10.1K | 23 | 344 | ~34 | 0 | 0 | A |