Skip to main content

max / makenotwork

chore: move todos + audit_review to private layer (gitignored)
Author: Max Johnson <me@maxj.phd> · 2026-06-05 00:20 UTC
Commit: 20d1f70a152429b997531ecb5b5cf8cfbdad7220
Parent: 6a7f0dc
5 files changed, +4 insertions, -1146 deletions
M .gitignore +4
@@ -46,3 +46,7 @@ mutants.out*
46 46
47 47 # Claude Code instructions (project-local; not for the public repo)
48 48 CLAUDE.md
49 +
50 + # Private working files — live in _private/, synced via Syncthing
51 + todo.md
52 + audit_review.md
D sando/todo.md -242
@@ -1,242 +0,0 @@
1 - # Sando TODO
2 -
3 - Open work only. Completed items move to `todo_done.md` (sibling file) when one exists. Design notes go in `plans/<name>.md`, not folded into checkboxes.
4 -
5 - Format rule: every actionable line is a `- [ ]` checkbox. Headings group phases and themes; do not put status updates in them.
6 -
7 - ## Resume here (next session)
8 -
9 - User-blocking before anything else:
10 -
11 - - [ ] Apply updated Tailscale ACL (`_private/infra/tailscale-acl-policy.json`) at https://login.tailscale.com/admin/acls — adds the `tag:server → tag:server as user max` SSH rule needed for offsite backup sync. Once live, Claude can finish: verify `makenotwork@alpha-west-1 → max@astra` ssh works, scp `MNW/server/deploy/sync-backup-offsite.sh` to `/opt/makenotwork/sync-backup-offsite.sh`, chmod +x, then trigger `sudo -u makenotwork /opt/makenotwork/backup-db.sh` and confirm a file lands in `max@astra:/opt/backups/mnw/`. Closes the "offsite broken" adjacent fire.
12 -
13 - Claude-only follow-ups (no user input needed; pick the next slice):
14 -
15 - - error-pages bake-into-binary via `include_dir!` (separate MNW PR) — closes Phase 3 §2 long-term
16 - - `cargo_test` gate red on MNW (Phase 0 follow-up) — diagnose, likely needs DB/env setup hook per test or `--test-threads=1`
17 - - Sandod build/test output streaming (Phase 0 follow-up) — pipe stdout to per-run log files instead of `Output` buffer; surface in WS `/events`
18 - - Phase 6 monitoring + alerting — Prometheus counters + alert rules
19 - - Phase 4 prep — first Sando-only deploy to testnot (needs Track B — see below)
20 - - Sando test suite — see "Testing" section below; sandod and TUI have zero unit/integration tests today
21 -
22 - Session 5 — 0.9.7 launched 2026-06-03 via Sando through host → A → B (hotfix=true, skip-burn-in). Soak cleanup closed (launchplan_final §1). Remaining:
23 -
24 - - [x] **Soak cleanup eligible 2026-06-10 — shortened and shipped 2026-06-03.** Gate verified clean since the 06-03 02:53 migration boot. Removed `/opt/git` (99M, stale duplicate of `/var/lib/mnw/git`), `/opt/makenotwork` (177M, post-yara-relocation), `/opt/backups` (277M, root pg_backup output). 553M reclaimed. yara-rules relocated from `/opt/makenotwork/yara-rules` → real `/opt/mnw/yara-rules` (733 rules compiled fine from new path).
25 - - [x] **Backups rebuilt under `/var/lib/mnw/backups/<db>/`** (makenotwork + multithreaded, per-DB subdirs), per-user crons (03:00 + 03:05), offsite to astra `/opt/backups/mnw/<db>/` via Tailscale SSH `tag:prod → max@tag:testing` rule. `backup-puller` rrsync re-rooted at `/var/lib/mnw/backups`; sando `backup.source` updated to `ssh://backup-puller@alpha-west-1:2200/makenotwork/latest.sql.gz`; `/backup/fetch` verified 38MB matched prod size.
26 - - [x] **Pre-existing meta.git ownership drift fixed inline** — `mnw-cli:git` → `git:git` (tightened `safe.directory` was rejecting it). Surfaced by post-rm ls-remote regression test.
27 - - [ ] **Remove live drop-in** `/etc/systemd/system/mnw-cli.service.d/fhs-git-path.conf` on prod. The unit file in `mnw-cli/deploy/mnw-cli.service` is patched to include `ReadWritePaths=/var/lib/mnw`, so the drop-in becomes redundant next time `./mnw-cli/deploy/deploy.sh --config` runs. Until then both apply (harmless dupe).
28 -
29 - Decision-gated (needs user input first):
30 -
31 - - Track B testnot live-app: postgres role+db (Claude), `.env` secrets (which Stripe/SMTP/S3 creds to use for staging — needs user), Caddyfile + Cloudflare Origin CA cert for testnot.work (user issues cert in CF dashboard; Claude installs)
32 - - Restart-warning hook for prod tier (Phase 5) — needs `CLI_SERVICE_TOKEN` accessible to sandod
33 -
34 -
35 -
36 - ## Testing
37 -
38 - Sando has zero automated tests today — daemon + TUI have been validated by running real scenarios end-to-end. Worth a pass before relying on it for prod cutover.
39 -
40 - ### TUI hands-on (Phase 5 acceptance — run interactively)
41 -
42 - - [ ] Launches against `SANDO_DAEMON=http://100.103.89.95:7766` without crashing; header shows daemon URL.
43 - - [ ] WS status: `ws ok` appears in the header within ~1s of launch (sandod is reachable).
44 - - [ ] WS reconnects: `sudo systemctl restart sandod` on fw13; header flips `ws ok → ws ... → ws ok` within ~5s. Events resume.
45 - - [ ] `↑/↓` and `j/k` move the row highlight through all 4 tiers; selection persists across the 2s state refresh.
46 - - [ ] `b` triggers backup fetch: status bar shows `[ok] backup/fetch: ...`, events log gets a `backup_fetched` line a moment later.
47 - - [ ] `c` on tier `a` (which has `current_version=0.8.12`) records a manual_confirm; event appears.
48 - - [ ] `c` on tier `mm` (no current_version) returns an HTTP error; status bar shows `[err]`.
49 - - [ ] `p` on tier `a` (assuming gates pass) issues a real deploy; sequence of `deploy_start → deploy_ok → promote_complete` events appears.
50 - - [ ] `R` on tier `a` rolls back to `previous_version`; `rollback` event appears. Reverse with `p` again.
51 - - [ ] `q`, `Esc`, `Ctrl-C` all quit cleanly; terminal restores correctly (no leftover raw mode).
52 - - [ ] Events ring buffer trims to 200: trigger ≥200 events (loop /backup/fetch), confirm the oldest scroll out, no panic.
53 - - [ ] Action while disconnected: kill sandod, hit `b`. Status shows error, TUI stays responsive.
54 -
55 - ### Sandod unit + integration tests (Claude-only)
56 -
57 - 55 tests passing as of 2026-05-31 (14 TUI + 41 daemon). Remaining gaps:
58 -
59 - - [x] `gates::reset_scratch` — verifies dropping every non-system schema (planted `foo` + `tower_sessions`, ran reset, asserted only `public` remains). Gated by `SANDO_TEST_PG_URL` env var so it skips on hosts without postgres. Run on fw13 with `SANDO_TEST_PG_URL=postgres:///sando_scratch?host=/var/run/postgresql cargo test`.
60 - - [x] `deploy::deploy_local` — copies multiple binaries (`PRIMARY`/`ADMIN`), swaps symlink atomically across two consecutive deploys, gc_local_releases keeps last N by mtime + handles missing dir + noop under threshold. `sh_quote` round-trip.
61 - - [x] `deploy::deploy_remote` failure path — against unroutable `192.0.2.1`, verifies clean ssh-attributed error (no panic / hang); ConnectTimeout bounds the test wallclock to ~10s. Plus `deploy_node` with `ssh_target="local"` short-circuits to symlink swap.
62 - - [x] `backup::fetch` URL parsing — extracted `parse_source` → `BackupSource` enum. 10 tests: file://, rsync://, ssh:// with/without port, multi-segment ssh path, non-numeric `:foo` colon treated as part of host (not port), and all malformed-input rejections (empty, scheme-only, ftp, no path on ssh, empty user@host).
63 - - [x] `events::emit` no-subscribers no-op; `emit_reaches_a_subscriber`; envelope serializes with flat `kind` field (locks the WS/TUI contract); `lagged_subscriber_observes_recv_error_lagged` exercises broadcast capacity.
64 - - [ ] `events_ws` handler end-to-end — drive WS through a slow client, assert `{"kind":"lagged",...}` frame arrives. Possible (bind axum to ephemeral port + tungstenite client) but the bus-level lag detection is already locked in by `lagged_subscriber_observes_recv_error_lagged`. Diminishing returns vs effort. Deferred.
65 - - [ ] `build` mutex behavior — requires real cargo or a slow stub. Treated as a manual checklist item under "TUI hands-on" instead. (Already validated by hand 2026-05-31.)
66 - - [x] `routes::confirm` — rejects when tier has no `current_version` (409 Conflict — surfaced that GateBlocked maps to 409 not 400, locked in), accepts + inserts a passing gate_runs row when set, 404 on unknown tier.
67 - - [x] `routes::promote` — refuses promote-to-first-tier (409), errors when neither body nor predecessor has a version, 404 when explicit version's `versions` row is missing.
68 - - [x] `unsatisfied_gates` — 6 tests: empty, failed-kind flagging, latest-row-wins (red→green flap clears), hotfix skips burn_in only, ignores other tiers/versions, **null `passed` treated as failing** (locks the in-flight-race safety property).
69 - - [x] `run_migrator` errors on missing migrations dir.
70 - - [x] sqlx migrations exercised via existing `sync` tests.
71 -
72 - ### End-to-end harness
73 -
74 - - [ ] Single-binary smoke: spin up sandod against tmpdir config + a tmp postgres; push a fixture commit; assert the full pipeline (build → gates → MM tier_state advance) completes in under 30s. Run on CI for every sando PR.
75 - - [ ] Pre-cutover dry run: stand up a throwaway tier-B node, point production-shape config at it, run `cargo_test → migration_dry_run → boot_smoke → promote` end to end. Use existing testnot for this once Track B is done.
76 -
77 - ### TUI unit tests
78 -
79 - - [x] `format_event` — golden tests for build_ok, gate_done (pass+fail), backup_fetched, deploy_failed, unknown kind, malformed JSON.
80 - - [x] `ws_url_from`: `http://` → `ws://`, `https://` → `wss://`, only replaces scheme once, unknown scheme passes through.
81 - - [x] `Action::Display` impl produces `backup/fetch`, `promote/<tier>`, etc.
82 - - [x] `Shared::push_event` ring-buffer cap at 200; oldest entries drop in FIFO order.
83 - - [x] `truncate` short-string passthrough vs long-string ellipsis.
84 -
85 - ---
86 -
87 - Roadmap target: replace `server/deploy/deploy.sh` and astra-hosted `server/deploy/run-ci.sh` with Sando running on **fw13**, gating Hetzner prod through testnot.work.
88 -
89 - **Host decision:** Sando runs on fw13 (x86_64 Ubuntu-derived, systemd). Architecturally closest to Hetzner prod, no cross-compile, no init-system split. MakeMachine and EveryCycle are now a separate project — not Sando's concern.
90 -
91 - Phases are ordered for execution. Phase 0 must finish before Phase 1 is meaningful. Phases 5+ are post-cutover hardening.
92 -
93 - ## Key Paths
94 -
95 - Read these to orient before working on Sando:
96 -
97 - - `README.md` — quickstart, API surface, v0 limitations
98 - - `sando.toml` — current topology (host → A → B; C declared, not provisioned)
99 - - `daemon/src/main.rs` — startup sequence (config → topology → migrate → sync → bare-repo bootstrap → serve)
100 - - `daemon/src/routes.rs` — `/state`, `/promote`, `/rollback`, `/rebuild`, `/backup/fetch`, `/events`
101 - - `daemon/src/gates.rs` — gate runners; the load-bearing logic
102 - - `daemon/src/build.rs` — host-tier build pipeline
103 - - `daemon/src/deploy.rs` — `deploy_local`; remote SSH stub
104 - - `daemon/migrations/001_init.sql` — schema (tiers/nodes as rows)
105 - - `server/deploy/deploy.sh` — current cross-compile + push-to-Hetzner script (what we are replacing)
106 - - `server/deploy/run-ci.sh` — current astra CI script (what we are replacing)
107 - - `_meta/docs/operations.md` — burn-in rule and hotfix policy that gates encode
108 -
109 - ---
110 -
111 - ## Phase 0 — fw13 bootstrap
112 -
113 - - [x] Provision `sando` system user on fw13; lock down home dir; generate SSH keypair at `/srv/sando/.ssh/id_ed25519` for outbound deploys.
114 - - [x] Install scratch Postgres locally on fw13; create `sando_scratch` role + DB used by `migration_dry_run`. (Owner of own DB; non-superuser.)
115 - - [x] Write systemd unit for `sandod` (long-run service, restart on failure, env from `/etc/sando/sando.env`). Installed at `/etc/systemd/system/sandod.service`.
116 - - [x] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`. Installed at `/etc/sando/sando.toml`; daemon config at `/etc/sando/sando-daemon.toml`.
117 - - [x] Install `sandod` binary at `/usr/local/bin/sandod`; enable + start the service. Live on `100.103.89.95:7766`; bare repo auto-bootstrapped at `/srv/sando/mnw.git`.
118 - - [x] Verify MNW server builds reproducibly on fw13. `makenotwork` 0.8.12 built in 132s; sqlx online mode against `sando_scratch` postgres (sandod prep-resets all non-system schemas + applies all 133 MNW migrations before invoking cargo).
119 - - [ ] Register sando pubkey with Hetzner prod (`deploy@alpha-west-1`) and testnot.work once that node exists. Pubkey: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEK+vhpr1V8VnsEemN9x6tAA2S05kmv/mQ3eVgSXSkJ8 sando@fw13`. (Moved to Phase 1 — not blocking Phase 0 exit.)
120 -
121 - ### Phase 0 follow-ups (not blocking, but visible)
122 -
123 - - [ ] `cargo_test` gate fails on MNW today — beyond the sqlx-online fix (already in), tests likely need a separate prepared DB (or per-test isolation). Investigate when wiring up Phase 1 gates.
124 - - [ ] Sandod observability: add `WS /events` (Phase 5) and consider streaming build/test stdout to a per-run log file rather than buffering in `Output`.
125 - - [ ] sqlx-cli (`v0.9.0`) at `/srv/sando/.cargo/bin/sqlx` is installed for the sando user but unused — sandod uses `sqlx::migrate::Migrator` programmatically (v0.8.6). Decide later whether to drop sqlx-cli or use it for diagnostics.
126 - - [ ] fw13 WoL: `ethtool` shows no wake-on capability on the USB ethernet — WoL likely won't work; rely on manual wake or BIOS settings. Record in `_meta/` if a solution surfaces.
127 -
128 - ## Phase 1 — Remote deploy
129 -
130 - The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync.
131 -
132 - - [x] Implement `deploy::deploy_node` remote path: rsync staged binary to `<ssh_target>:<release_root>/releases/<version>/<bin_name>`, then `ssh <ssh_target>` does `mv -Tf` symlink swap + `sudo systemctl reload-or-restart <service>`. First real promote landed 2026-05-31: fw13 → testnot, version 0.8.12.
133 - - [x] Add `node.service_name` to `sando.toml` (default `makenotwork.service`).
134 - - [x] Bootstrap script for adding a fresh node: `MNW/sando/deploy/bootstrap-node.sh`. (See Phase 3 — node-bootstrap script for full details.)
135 - - [x] Garbage-collect old releases on the remote: keep last N=5 per node, sorted by mtime. Runs at end of each successful deploy (local + remote variants). Tied via `RELEASES_TO_KEEP` const.
136 - - [x] Handle `rsync` failure mid-deploy: leave the previous `current` symlink intact; mark `deploys.outcome = 'failed'`; do not advance `tier_state`. (Verified the routes.rs path; rsync runs before symlink swap so failure naturally leaves `current` untouched.)
137 -
138 - ### Phase 1 — Track B: testnot live-app setup (NOT blocking Phase 2)
139 -
140 - Sando's deploy machinery is done, but testnot's MNW runtime needs the rest before its `makenotwork.service` can stay up:
141 -
142 - - [ ] Provision `makenotwork` postgres role + db on testnot (postgres-18 already installed).
143 - - [ ] `/opt/mnw/.env` with staging Stripe keys, SMTP, S3, DATABASE_URL, all other MNW env. Decide which subset of integrations get test/sandbox credentials vs are stubbed.
144 - - [ ] Caddyfile for testnot.work — strip prod's blocks down to just the main reverse_proxy (and forums/cdn if needed). Cloudflare Origin CA cert for testnot.work issued + placed at `/etc/caddy/`. AOP CA already universal.
145 - - [ ] `error-pages/` for testnot (copy or symlink from a release dir).
146 - - [ ] Wire post-deploy smoke check (`curl https://testnot.work/health` after the symlink swap, before declaring deploy ok). Sando-side, gate-like; spec in Phase 2 boot_smoke wording.
147 -
148 - ## Phase 2 — Backup pipeline + migration dry-run
149 -
150 - `migration_dry_run` is the load-bearing gate. It needs a real backup source, not a fixture.
151 -
152 - - [x] ~~Confirm astra's offsite replica writes a deterministic latest-link path.~~ Pivoted: pull direct from prod (`backup-puller@alpha-west-1:2200`, rrsync-locked to `/opt/makenotwork/backups/`). Astra offsite is separately broken — see carryover below.
153 - - [x] Wire the production `sando.toml` `backup.source` — `ssh://backup-puller@alpha-west-1:2200/latest.sql.gz` with `latest.sql.gz` as a hard link on prod.
154 - - [x] Schedule a daily `POST /backup/fetch` (systemd timer on fw13). `sandod-backup-fetch.{service,timer}` in `MNW/sando/deploy/`. Runs daily at 04:00 UTC (one hour after prod's 03:00 UTC backup-db.sh). Service uses `EnvironmentFile=/etc/sando/sando.env` for `$SANDO_DAEMON`. Verified 2026-05-31: one-shot test pulled 36MB backup, recorded in `backups` table.
155 - - [x] First end-to-end `migration_dry_run` against a real prod backup. Passed 2026-05-31 for sha 4541ebc in 1.2s: restored 36MB dump + applied all 133 migrations cleanly. Sha eee96a7 correctly failed `migration_dry_run` because it lacked migrations 123-132 that prod has applied — exactly the prod-vs-repo drift the gate is designed to catch.
156 - - [x] Document the failure modes: `plans/migration-dryrun-failures.md`. Covers all 7 fail modes (no backup, scratch_url unset, scratch reset, restore, drift, checksum mismatch, content broken against prod data) with operator playbook.
157 - - [x] Decide retention on `backups` table. 30 days; pruned at end of `backup::fetch`. `DELETE FROM backups WHERE fetched_at < datetime('now', '-30 days')`.
158 -
159 - ### Phase 2 carryovers / adjacent fires
160 -
161 - - [ ] **Offsite backup sync from prod → astra still broken.** Diagnosed 2026-05-31: `sync-backup-offsite.sh` was never deployed to prod (`deploy.sh` gap when it was added). `makenotwork@prod` had no SSH key. Generated key + installed pubkey on `max@astra:~/.ssh/authorized_keys`, created `/opt/backups/mnw` on astra. **Blocked** on Tailscale ACL: astra runs only Tailscale SSH (no regular sshd on a bypass port), and the ACL denies `tag:tagged-devices` (alpha-west-1) → astra as user `max`. Needs ACL update in the Tailscale admin console, then deploy `sync-backup-offsite.sh` to `/opt/makenotwork/` and test. Makenotwork@prod pubkey: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILzyQQ7pmBIZat8fABlpG/opwh4w5GhLIfkX2qxKxuT0 makenotwork@alpha-west-1`.
162 - - [x] **Prod backup `latest.sql.gz` hard link.** `backup-db.sh` now maintains `latest.sql.gz` atomically (`ln -f $LATEST.new && mv -Tf .new latest.sql.gz`). Deployed 2026-05-31; manual run verified (nlinks=2).
163 -
164 - ## Phase 3 — Parity with current `deploy.sh`
165 -
166 - Decisions captured in `plans/config-artifacts.md`. Summary: Caddyfile / systemd unit / backup script / security configs all move to **one-time node-bootstrap**, not per-deploy. error-pages bake into binary (MNW PR) with sibling fallback. mnw-admin ships alongside server via `bin_names: Vec<String>`. Restart warning is Phase 5, prod-tier-only. Prod migrations: server self-applies on startup (`main.rs:73`), sando does not.
167 -
168 - - [x] **Caddyfile** — decided: bootstrap-only. Not per-deploy. (`plans/config-artifacts.md` §1.)
169 - - [x] **systemd unit** — decided: bootstrap-only. (§4.)
170 - - [x] **Backup script** — decided: bootstrap-only. (§6.)
171 - - [x] **Error pages** — short-term done: ship as release-dir sibling. `build_and_run_mm` `cp -a` from `worktree/server/deploy/error-pages/` into the staged release dir; deploy_node's rsync of the whole dir picks it up. Verified on testnot 2026-05-31. Long-term `include_dir!` bake-in still a separate MNW PR.
172 - - [x] **mnw-admin binary** — `cfg.bin_names: Vec<String>` (default `["server"]`, MNW uses `["makenotwork","mnw-admin"]`). `deploy_local` copies each from worktree's `target/release/<bin>`; `deploy_node` rsyncs the whole staged dir. `Config::primary_bin()` returns first entry for systemd reference. `versions.artifact_path` stores the primary; release dir is derived as `.parent()`. Verified on testnot 2026-05-31.
173 - - [x] **Security configs** — decided: bootstrap-only. (§5.)
174 - - [ ] **Restart warning** — Phase 5, prod-tier only via `tier.restart_warning_seconds` in `sando.toml`; needs `CLI_SERVICE_TOKEN` in `/etc/sando/sando.env`. (§7.)
175 - - [x] **Cross-compile from macOS** — decided: retire after one sprint of testnot parity verification. fw13 builds natively. (§8.)
176 - - [x] **Prod migrations** — decided: server self-applies on startup. Sando does NOT run them. `migration_dry_run` gate is the prod safety net. (§9.)
177 - - [x] **Node-bootstrap script** — `MNW/sando/deploy/bootstrap-node.sh`. Idempotent. Takes `SANDO_PUBKEY` (required), `BIN_NAME`, `SERVICE_NAME`, `SERVICE_USER`, `DEPLOY_ROOT` env. Installs base packages (rsync/ufw/fail2ban), optionally postgres/tailscale/caddy, creates deploy user + dirs + sudoers entry + systemd unit, sets up UFW. Deliberately does NOT touch Caddyfile content, certs, postgres role/db, or secrets — those are operator-decisions per-node. testnot was done by hand and matches roughly what the script produces. Test by re-running on the next node added (tier B Hetzner prod move or tier C).
178 -
179 - ## Phase 4 — Cutover
180 -
181 - Run Sando in parallel with `deploy.sh` until trust is built, then retire the old path.
182 -
183 - - [ ] First successful Sando-only deploy to **testnot.work** (tier A). Old `deploy.sh` still primary for prod.
184 - - [ ] One sprint (two months) of Sando-shadow runs: every `deploy.sh` deploy is also driven through Sando in dry-run mode (gates run, deploys go to a parallel `releases/` dir on prod but don't swap `current`). Compare outcomes.
185 - - [ ] First Sando-only deploy to **Hetzner prod** (tier B). `deploy.sh` retained but unused.
186 - - [ ] Move `server/deploy/deploy.sh` to `server/deploy/archive/deploy.sh.legacy` with a header explaining the cutover; do not delete (reference for the next year).
187 - - [ ] Decommission astra CI runner (`server/deploy/run-ci.sh`). Sando's `cargo_test` gate replaces it; if any astra-specific checks are still needed (e.g., `cargo audit`), add them as additional gate kinds in `daemon/src/gates.rs`.
188 - - [ ] Update `CLAUDE.md` and `_meta/docs/operations.md` to point at Sando, not `deploy.sh`.
189 -
190 - ## Phase 5 — Operator UX
191 -
192 - The TUI polls. The MVP requires you to hand-insert a row for `manual_confirm`. Both are fine for one operator but rough.
193 -
194 - - [x] Build mutex: single-slot `AppState.active_build: Mutex<Option<AbortHandle>>`; newer `/rebuild` aborts any in-flight build. Cargo commands set `.kill_on_drop(true)` so abort propagates SIGKILL to cargo + rustc children. (Landed 2026-05-31 after observing two concurrent builds racing the scratch DB.)
195 - - [x] Implement `WS /events`: tail of gate starts/finishes, deploy events, build logs. Event enum in `daemon/src/events.rs`; `broadcast::channel(256)` in `AppState`; emit sites in build.rs, gates.rs, routes.rs (rebuild, promote, rollback, confirm, backup_fetch). Verified 2026-05-31: live JSON envelopes stream to a python `websockets` client.
196 - - [x] TUI: actions pane. `↑↓`/`jk` select tier; `p` promote (no body — defaults version); `R` rollback; `b` backup fetch; `c` manual_confirm. Action results land in the events log. Daemon URL via `$SANDO_DAEMON`. Built in `tui/src/main.rs` 2026-05-31.
197 - - [x] `POST /confirm/{tier}` endpoint — inserts `gate_runs` row with `passed=1, gate_kind='manual_confirm'` for the tier's `current_version`. Replaces hand-SQL workaround. Verified 2026-05-31 against tier `a`.
198 - - [x] TUI live log pane that follows the most recent build / gate run; backed by `WS /events`. 200-event ring buffer, human-formatted per kind. WS auto-reconnects every 3s. Header shows ws connection state.
199 - - [x] `POST /promote` body — `version` now optional; defaults to predecessor tier's `current_version`. (Unblocks the "promote what just baked" flow.)
200 -
201 - ## Phase 6 — Monitoring + alerting
202 -
203 - - [ ] Wire fw13 `/metrics` endpoint into the existing MNW Prometheus scrape config; record where the scrape config lives in `_meta/` or wherever monitoring already runs.
204 - - [ ] Add counters: `sando_builds_total{outcome}`, `sando_gates_total{tier,kind,outcome}`, `sando_deploys_total{tier,outcome}`, `sando_burn_in_remaining_hours{tier}`.
205 - - [ ] Alert: build failed. Page on first failure (not flap-protected — builds are infrequent).
206 - - [ ] Alert: migration_dry_run failed. Page immediately. This is the 2026-05-22-class signal.
207 - - [ ] Alert: a tier has had `current_version` unchanged for > N days while host is green. (Operator forgot to promote.)
208 -
209 - ## Phase 7 — Multi-node B+C
210 -
211 - Today B is the only prod node. Adding C is the second prod node + CF Load Balancing.
212 -
213 - - [ ] Provision tier C node (Hetzner or alternate provider — capture rationale).
214 - - [ ] Update `sando.toml`: set `c.provisioned = true`, add `[[tier.node]]`.
215 - - [ ] Set up Cloudflare Load Balancing with B + C as origin pool, health-checked.
216 - - [ ] Verify sequential canary in Sando: deploy to B, wait for CF health-check to mark healthy (probably 30-60s probe interval), then deploy to C. Add a `node.health_url` field and a gate-style wait between nodes.
217 - - [ ] Document in README that `canary = "parallel"` exists but should never be used for B+C unless you understand the failure modes.
218 -
219 - ## Phase 8 — Postgres-on-D
220 -
221 - Move Postgres off the prod app node so B+C become truly interchangeable.
222 -
223 - - [ ] Provision Postgres-only machine D (modest spec; reliability over performance).
224 - - [ ] Migrate the prod DB from Hetzner app node to D. Capture procedure in `plans/postgres-d-migration.md`.
225 - - [ ] Update `server` `DATABASE_URL` everywhere (env files on B+C, scratch URL on fw13 stays local).
226 - - [ ] Replica/HA story stays deferred; D is SPOF for now (per `_meta/preclear/.../decisions.md`).
227 -
228 - ## Phase 9 — Hardening
229 -
230 - Pick up after cutover is stable.
231 -
232 - - [ ] Tailnet ACL audit: confirm only the laptop can reach `sandod:7766`. Document the ACL.
233 - - [ ] Decide if v0.2 needs token auth on `sandod` endpoints (revisit assumption from `decisions.md` once there's a real second operator).
234 - - [ ] Sando self-deploy: Sando builds and deploys *itself* through its own pipeline. Bootstraps the bootstrap. Closes the chicken-and-egg loop and is satisfying.
235 - - [ ] Backup-of-Sando-state: nightly SQLite snapshot to astra. The state DB tracks 6 months of deploys; losing it on a fw13 disk failure would be annoying.
236 -
237 - ## Notes / non-checkbox
238 -
239 - - WS `/events` and the operator-UX work in Phase 5 can run in parallel with Phase 1-3 once Phase 0 is done. They are sequenced after for review clarity, not because they block anything.
240 - - "Hotfix override" and `reset_burn_in` flag are already implemented end-to-end (see `decisions.md`); not on this list because there's nothing left to do until prod uses them.
241 - - C tier exists in the schema as a `provisioned=false` row from day one — adding C in Phase 7 is a TOML edit, not a migration.
242 - - MakeMachine + EveryCycle are a separate project. The hardware BOM moved to `~/Code/everycycle/docs/hardware/mm-v1-bom.md` on 2026-06-01.
@@ -1,1192 +0,0 @@
1 - # Ultra Fuzz Report — MNW Server (Run #9 — launch eve)
2 -
3 - **Run date:** 2026-05-31 (evening)
4 - **Run number:** 9 (launchplan_final.md §1.5 referred to it as "Run #5" — stale; this is the 9th)
5 - **Trigger:** launchplan §1.5 pre-launch pass
6 -
7 - ## Run #9 headline
8 -
9 - Run #8 closed with "BAR MET — ALL FIVE AXES A-". Run #9 went deeper and surfaced 1 CRITICAL + 4 SERIOUS + several MED/HIGH items the prior 8 runs missed. All four launch-critical items fixed in-session; remaining items deferred with rationale below.
10 -
11 - | Axis | Run #8 | Run #9 | Direction |
12 - |------|--------|--------|-----------|
13 - | Payments | A- | A- | flat — 2 new SERIOUS surfaced; 1 fixed (webhook unmark on dual-failure 503), 1 deferred (subscription out-of-order webhook) |
14 - | Storage | A- | A- | flat — 1 new HIGH (migration 129 dead-letter table unused) + 2 MEDs (is_s3_key_live unindexed full-scan, LIKE-suffix false-positive); deferred |
15 - | UX Wiring | A- → B- → A- | A- | dipped on grade-cap for signup TOCTOU CRITICAL, restored after fix |
16 - | Security | A- | A- | flat — 2 new SERIOUS, both fixed (JWT-bump non-atomic, 2FA email IP spoofable) |
17 - | Performance | A- | A- | flat — 2 new HIGH (per-request reqwest::Client::new in 5 hot paths, unbounded spawn in expired-account cleanup); deferred to post-launch |
18 -
19 - **Net Run #9 (post-fix):** 0 CRITICAL · 1 SERIOUS open (Payments subscription ordering — documented deferral) · 3 HIGH open (deferred) · 7 MED open (deferred). **Launchplan §1.5 A- bar holds.**
20 -
21 - ## Run #9 — CRITICAL fixed in-session
22 -
23 - ### UX-CRITICAL — Signup TOCTOU: race → 500 + form loss → FIXED 2026-05-31
24 -
25 - `src/routes/pages/public/join_wizard.rs:99-139`. The wizard ran separate `get_user_by_username` / `get_user_by_email` checks before `create_user`. A concurrent signup with the same username or email slipping between SELECT and INSERT raised a 23505 unique violation that bubbled to `AppError::Database` → 500 "Something went wrong" — and the user's entire typed-in form was lost. On a public alpha-launch surge this is the highest-traffic public endpoint; the wrong page to be returning 500s on.
26 -
27 - **Fix landed:** `create_user` call site now matches `AppError::Database(sqlx::Error::Database(_))` with code 23505, inspects the constraint name (`users_username_key` / `users_email_key`), and routes through `return_error(..)` with a friendly message — same flow as the explicit pre-check branches. Same shape as the existing 23505 handling in `db/license_keys.rs`, `db/builds.rs`, `routes/api/guest_checkout.rs`.
28 -
29 - **Known follow-up (not blocking):** the form-reload still loses typed values on the error swap; `return_error` renders `LoginErrorTemplate` (message-only). Preserving field values would require threading them through the template — file a separate Phase 4 polish item.
30 -
31 - ## Run #9 — SERIOUS fixed in-session
32 -
33 - ### Sec-SERIOUS — `delete_all_sessions_for_user` non-atomic JWT bump → FIXED 2026-05-31
34 -
35 - `src/db/sessions.rs:247-263`. The function ran `DELETE FROM user_sessions` then a separate `UPDATE users SET jwt_invalidated_at = NOW()` on independent connections. If the UPDATE dropped (pool timeout, conn drop, postgres restart), session cookies were dead but every outstanding SyncKit JWT survived until natural expiry — exactly the leak this function exists to prevent. The in-code comment ("a session row deleted without a JWT bump is harmless, the converse would leak access") inverted reality.
36 -
37 - **Fix landed:** both writes wrapped in `pool.begin()` / `tx.commit()`. Comment updated.
38 -
39 - ### Sec-SERIOUS — 2FA login-notification email uses spoofable IP → FIXED 2026-05-31
40 -
41 - `src/routes/pages/public/two_factor.rs:308-312`. The 2FA-completion path read `x-forwarded-for` raw (first-comma-split) for the new-login email's IP field. Every other login surface (`routes/auth.rs:242`, `auth.rs:486`, `auth.rs:528`) routes through `crate::helpers::extract_client_ip` which prioritizes `CF-Connecting-IP`. An attacker who already captured a password could pre-set `X-Forwarded-For: 1.2.3.4` on the verify-2fa POST so the "new login from <city>" email lied about origin — the exact email users are told to trust for compromise detection.
42 -
43 - **Fix landed:** swapped to `crate::helpers::extract_client_ip(&headers)`. One-line change, parity restored.
44 -
45 - ### Pay-SERIOUS — Webhook dual-failure dropped events silently → FIXED 2026-05-31
46 -
47 - `src/routes/stripe/webhook/mod.rs:73-89`. Dedup row was marked processed before handler dispatch (correct for at-least-once). On `(handler_err, insert_failed_event_err)` dual failure, code returned 503 to trigger Stripe redelivery — but Stripe's redelivery would short-circuit at the dedup check (line 50) and 200 the event without ever processing it. The code's own comment acknowledged the bug; the right tool (`unmark_event_processed`, defined 30 lines away in `db/webhook_events.rs:40`) was never called.
48 -
49 - **Fix landed:** call `db::webhook_events::unmark_event_processed(&state.db, &event_id)` before returning 503, with logged-error best-effort if even that fails (same scenario where 503 was already wrong).
50 -
51 - ## Run #9 — DEFERRED with rationale (above A- bar)
52 -
53 - ### Pay-SERIOUS — Subscription webhook out-of-order events resurrect `active`
54 -
55 - `src/routes/stripe/webhook/subscriptions.rs:90, 116, 140`. Handlers blindly overwrite `subscriptions.status` and `period_end` from the webhook payload. Stripe does NOT guarantee delivery order. Sequence `past_due → active` reordered as `active → past_due → active(stale)` overwrites a legitimate `past_due` with stale `active` — restoring access for a user who hasn't paid.
56 -
57 - **Deferral rationale:** worst case is restored access for a few minutes until the next webhook arrives. Fix requires re-extracting Stripe's top-level `created` from `UntypedEvent` (currently dropped) and adding `WHERE last_event_at IS NULL OR last_event_at <= $created` guards on every status/period write across Fan+, creator-tier, and synckit code paths — non-trivial cross-cutting change. Post-launch fix in Phase 4; tracked in todo.md.
58 -
59 - ### Sto-HIGH — Migration 129 dead-letter table never written
60 -
61 - `migrations/129_pending_s3_deletions_dead_letter.sql` creates `pending_s3_deletions_dead_letter` and documents it as "operator-visible parking lot... require manual triage." `src/scheduler/cleanup.rs:453-457` on `attempts >= 10` only logs `tracing::error!` then removes the row — never inserts into the dead-letter table. Permanently-failing keys have zero operator visibility.
62 -
63 - **Deferral rationale:** operational, not runtime. No user impact; only operators lose triage signal. One-INSERT fix; bundle into Phase 4.
64 -
65 - ### Perf-HIGH — Per-request `reqwest::Client::new()` in 5 hot paths
66 -
67 - `routes/pages/dashboard/main.rs:118`, `routes/pages/public/landing.rs:284`, `routes/api/internal/cli_features.rs:440`, `routes/api/domains.rs:319`, `auth.rs:559`. Each call builds a fresh TCP pool, TLS context, DNS resolver — no keep-alive across requests. `MtClient` in `AppState` already keeps a pooled client; the dashboard bypasses it.
68 -
69 - **Deferral rationale:** real but matters at scale. Private alpha launch traffic well below where this becomes a tail-latency contributor. 30-min refactor; bundle into Phase 4 once launch traffic settles.
70 -
71 - ### Perf-HIGH — Unbounded `tokio::spawn` in expired-account cleanup
72 -
73 - `src/scheduler/cleanup.rs:215-220` (`spawn_expired_account_cleanups`). Daily tick spawns one task per expired account, no governor. `cleanup_sandbox_accounts` (same file, ~100 lines above) correctly caps at `CLEANUP_PARALLELISM=4` via `JoinSet`; the terminated/content-removal variants don't. A backlog of 200 expired accounts fan-outs 200 concurrent S3 prefix listings racing for the 25-conn pool at midnight.
74 -
75 - **Deferral rationale:** runs once daily; current expired-account count is small (private alpha). Trivial fix (lift the existing JoinSet pattern); not launch-blocking. Bundle with Phase 4.
76 -
77 - ## Run #9 — MED/LOW deferred (read-only carry-forward, in todo.md)
78 -
79 - - Pay-MED: `pricing.rs::parse_dollars_to_cents` misinterprets European decimal comma (`1,23` → 12300¢). User-controlled input; fixable in a single regex.
80 - - Pay-MED: SyncKit app-sub checkout silently defaults `storage_limit_bytes` to 0 if metadata missing.
81 - - Pay-MED: Guest checkout email falls back to `"unknown@guest"` sentinel; collisions possible.
82 - - Sto-MED: `is_s3_key_live` runs 7 EXISTS subqueries on unindexed `items.audio_s3_key` / `cover_s3_key` / `video_s3_key` / `versions.s3_key` etc — sequential scans per retry.
83 - - Sto-MED: `is_s3_key_live` LIKE-suffix pattern `'%' || s3_key` false-positives on neighboring keys (key `abc/file.png` matches `xabc/file.png`) — skips a legitimate delete → S3 object leaks.
84 - - UX-MED: "Log in" return_to query param in `purchase.html:145` is dead-wired — login handler always redirects `/dashboard`. Lost purchase intent.
85 - - UX-MED: Admin user filter buttons (`admin-users.html:35-44`) use `class="primary"` / `class="secondary"` instead of `btn-primary` / `btn-secondary` — renders unstyled.
86 - - UX-LOW: Pagination links in `git/issues.html:72,76` don't URL-encode `search`; `&page=99` in search query corrupts pagination.
87 - - UX-LOW: 5 sites do `.render().unwrap_or_default()` on Askama templates (blank UI on render failure, no log).
88 - - UX-LOW: `slugify` in `formatting.rs` produces `"post"` for any non-ASCII title; international creators get opaque URLs.
89 - - Sec-MINOR: `csrf.rs:176-185` `validate_token_consuming` doesn't consume — name promises stronger property than implementation.
90 - - Sec-MINOR: `routes/oauth.rs:101-111` `is_localhost_redirect` allows any port on localhost regardless of registered URI.
91 - - Sec-MINOR: `routes/pages/public/two_factor.rs::pending_2fa_started_at` reads `i64` via session.get; type mismatch silently → None → instantly-expired.
92 - - Sec-MINOR: `scanning/archive.rs:124` path-traversal check misses lone `..` segment (no trailing separator).
93 - - Perf-LOW: `scheduler/announcements.rs` linear walk through subscriber list in a single spawned task; no checkpointing.
94 - - Perf-LOW: `db/page_views.rs` `pending` HashMap has no max-cardinality cap (crawler hitting 100k unique target_ids before tick).
95 - - Perf-LOW: `build_runner.rs:441` local artifact tmpfile leaks if process crashes between SCP and `remove_file`.
96 -
97 - ## Run #9 — mandatory surprises
98 -
99 - - **Payments:** `routes/stripe/webhook/mod.rs:82-89` literally documents the bug it ships ("the dedup row was already marked processed... Stripe won't retry") and then chooses 503 anyway. The fix (`unmark_event_processed`) sat 30 lines away in the same crate, never called. Scar-tissue-comment-without-the-fix is a recognizable pattern across the codebase.
100 - - **Storage:** `routes/storage/mod.rs::commit_upload` sealed-helper pattern (Run #7 fix for the chronic disease) is the strongest piece of structural engineering in the repo — turned an enum into a witness type. But the *neighbor* file `migrations/129_pending_s3_deletions_dead_letter.sql` shows the opposite: migration written with detailed prose explaining the operator's parking lot, and the actual INSERT never wired up. Two adjacent fixes from the same audit-cycle, one structural and load-bearing, one ceremonial and silently broken.
101 - - **UX:** `csrf.rs` `PostureMethodRouter` + sealed `CsrfManuallyValidated` witness make registering a mutation route without an explicit posture declaration *uncompilable*. A+ engineering. The contrast with the signup wizard's TOCTOU-and-500-with-lost-form is jarring — defensive depth on CSRF, none on the front door.
102 - - **Security:** `routes/auth.rs:128-130` malformed-email branch skips the DUMMY_HASH timing equalizer that was added explicitly to prevent timing-side-channel user enumeration. ~2 orders of magnitude faster than every other failure path. The equalizer exists; this one path bypasses it.
103 - - **Performance:** `db/projects.rs::get_project_ids_for_user` is the only `fetch_all` in `projects.rs` without a `LIMIT`. Its neighbor `get_projects_by_user` caps at 500 with a documented safety comment. Cyber-squatter with 10k projects + account expiry → 10k S3 prefix-deletes in one spawned task. Asymmetric defense within the same module.
104 -
105 - ## Run #9 — stress-tested OK
106 -
107 - Verified attacks the code survived (high-confidence positives):
108 -
109 - - Stripe webhook signature replay (HMAC constant-time, multi-secret rotation, timestamp tolerance both directions)
110 - - Promo code concurrent over-use (single atomic UPDATE with max_uses + expires_at + starts_at)
111 - - Cart race past pre-check (23505 fallback aborts cleanly without charging)
112 - - License key prediction (6 wordlist × CSPRNG ≈ 66 bits)
113 - - Pre-signed URL Content-Length binding (S3 rejects mismatch at protocol level)
114 - - Storage cap atomicity (`try_replace_storage` single UPDATE)
115 - - Build claim race (partial unique index + 23505 backstop)
116 - - Idempotent re-confirms in all 4 upload confirm handlers (reaper-deletes-live-object closed)
117 - - Session row + JWT atomicity (post-fix verified above)
118 - - TOTP replay across skew window (matched-step tracked + strict `>` gate)
119 - - OAuth PKCE downgrade (S256 pinned at authorize + token-exchange)
120 - - CSRF body bypass via textarea-smuggled token (proper form parser)
121 - - Git diff/blame XSS (HTML-escaped in attacker-controlled spots)
122 - - Internal error leakage (tests assert no PG host, no S3 bucket, no sqlx variant leaks)
123 -
124 - ## Run #9 confidence per axis
125 -
126 - - Payments **HIGH** (~70% LoC read this pass; Phase 4 backlog visible)
127 - - Storage **HIGH** (full module read; cleanup.rs upper half only — MEDIUM there)
128 - - UX Wiring **HIGH** for CSRF/error/validation; **MEDIUM** for wizard step partials, embed routes, dashboard CSV import
129 - - Security **HIGH** for auth/CSRF/session; **MEDIUM** for scanning (YARA rule content unread), API key scoping
130 - - Performance **HIGH** for scan worker, scheduler, storage, build_runner; **MEDIUM** for SyncKit, postmark, import pipeline
131 -
132 - ## Run #9 bug counts
133 -
134 - | Severity | Payments | Storage | UX | Security | Perf | Total |
135 - |---|---|---|---|---|---|---|
136 - | CRITICAL | — | — | 1 (FIXED) | — | — | **1** |
137 - | SERIOUS | 2 (1 FIXED, 1 deferred) | — | — | 2 (FIXED) | — | **4** |
138 - | HIGH | — | 1 (deferred) | — | — | 2 (deferred) | **3** |
139 - | MED | 3 (deferred) | 2 (deferred) | 2 (deferred) | — | — | **7** |
140 - | LOW/NOTE | 2 | — | 3 | 4 | 3 | 12 |
141 -
142 - ## Run #9 delta vs Run #8
143 -
144 - - 1 CRITICAL surfaced + fixed (signup TOCTOU); class missed by prior 8 runs because no agent explicitly probed the public-signup race window
145 - - 4 SERIOUS surfaced; 3 fixed in-session, 1 deferred with rationale
146 - - Run #8 "BAR MET" claim was correct *for the surfaces it audited* but understated: this pass added explicit attack-vector probing for cross-conn atomicity, IP spoof parity across auth surfaces, and webhook dedup edge paths — none of which were in prior runs' scope
147 - - All previously closed Run #8 fixes verified intact (commit_upload seal, S1 tx atomicity, background.rs queue, cart MEDs)
148 -
149 - ---
150 -
151 - # Ultra Fuzz Report — MNW Server (Run #8 — historical)
152 -
153 - **Run date:** 2026-05-31
154 - **Run number:** 8
155 -
156 - ## Run #8 Headline
157 -
158 - | Axis | Run #5 | Run #6 | Run #7 | Run #8 | Direction |
159 - |------|--------|--------|--------|--------|-----------|
160 - | Payments | B | B+ | A- | **A-** | flat — H2 still deferred; 2 new MEDs surfaced (cart `min_price_cents` bypass, cart-all chain-break on all-free first seller) |
161 - | Storage | B- | A- | B+ | **A-** | ↑ H1 + S1 fixes verified closed; commit_upload seal intact across all 7 confirm handlers; genericization clean at every caller including synckit/blobs.rs |
162 - | UX Wiring | B | A- | A- | **A-** | flat — 1 new MED (item-wizard `pricing_model` silent fallback to "free" — same disease class fixed in project wizard at Run #6, not propagated) |
163 - | Security | A- | A- | A- | **A-** | flat — only diff in scope (username availability fail-closed) is a net improvement; MED backlog identical to Run #5/#6/#7 |
164 - | Performance | B- | A- | A- | **A-** | flat with 1 new SERIOUS — webhook `checkout_helpers.rs` unbounded `tokio::spawn` (send_purchase_emails / mailing_list / tip_email) competes with request handlers for the 25-slot pool under burst |
165 -
166 - **Net Run #8:** 0 CRITICAL · 1 SERIOUS new (Perf webhook spawn) — FIXED 2026-05-31 · 5 new MED — ALL FIXED 2026-05-31 · 1 SERIOUS previously-deferred (Payments H2 `claim_free_project` soft race) — FIXED 2026-05-31.
167 -
168 - **Post-Run #8 status (2026-05-31 end-of-day): 0 CRITICAL · 0 SERIOUS · 0 MED open from any prior run.** All five axes A-, all above-MED items closed, all Run #8 MEDs closed, prior-deferred SERIOUS closed. Launchplan §1.5 bar fully cleared.
169 -
170 - **2026-05-31 post-Run-#8 backlog sweep (7 waves):** 24 of 26 carried MED/LOW/NOTE items closed across Storage (5), Security (8), Performance (3), UX (2), Payments (2), Auth (4). Two deferred with rationale: `build_runner.rs` serial targets (LOW, builds run rarely, refactor touches denominator) and `scheduler/mod.rs` advisory-lock granularity (multi-replica concern, single-process today). New schema migration `133_items_duration_seconds_nonnegative.sql` pins the negative-duration invariant in the DB. New `commit_rescan` helper extends the chronic-disease commit_upload seal to admin paths. Tests: 1655 / 0.
171 -
172 - **Launchplan §1.5 bar:** **ALL 5 AXES AT A- — BAR MET.** The new Perf SERIOUS is axis-internal and the agent kept Perf at A- (machinery wins outweigh; same shape as previously-closed `record_view` per-request spawn — apply mpsc + drainer pattern). New Payments MEDs and UX MED are launch-quality items worth addressing or documenting before ship; none are A- blockers.
173 -
174 - ## Run #8 — new findings above MED
175 -
176 - ### P-SERIOUS — Webhook hot-path unbounded `tokio::spawn` (Performance) — FIXED 2026-05-31
177 - `src/routes/stripe/webhook/checkout_helpers.rs:58, 96, 124, 290` + `src/routes/stripe/webhook/checkout.rs:618`. `send_purchase_emails`, `subscribe_buyer_to_mailing_list`, `send_tip_email`, `send_guest_sale_notification`, guest-purchase-confirmation each `tokio::spawn` from the webhook handler. Multi-item cart fires N spawns per webhook; each task acquires 1-2 pool conns + a Postmark call. No JoinSet, no cap. Under burst, hundreds of detached tasks competed with request handlers for the 25-slot pool. Same shape as the Run #4 `record_view` per-request spawn (fixed via mpsc + drainer).
178 -
179 - **Fix landed:** new generic `src/background.rs` module — `BackgroundTx` + `spawn_pool()` with bounded mpsc (capacity 1024) + semaphore-bounded concurrent execution (8 workers, well below `DB_POOL_MAX_CONNECTIONS=25`). `state.bg.spawn(name, fut)` is non-blocking; queue overflow logs a warning and drops the task. The `spawn_email!` macro was refactored to use the bg queue (covers 17 callers across auth/admin/follows/library/two_factor/stripe webhook/login flows). The 5 manual webhook `tokio::spawn` sites were also migrated. Per-request email sends from postmark issue replies (×2), guest-claim email, and join-wizard signup (×2) were migrated in the same pass — same disease, same fix.
180 -
181 - **Out of scope for this fix** (different bug shapes; defer to Phase 4 polish or own remediation): import pipeline (long-running, needs own bound), MT community creation (single outbound HTTP, minor pool pressure), creator departure notification + status broadcast (broadcast-class — use `broadcast.rs` JoinSet pattern), idempotency-store post-response (trivial DB write), build_runner (already gated by claim flow), scheduler/monitor/scanning/page_views (background workers, not per-request).
182 -
183 - ### Payments MED — Cart `min_price_cents` bypass — FIXED 2026-05-31
184 - Both cart paths (`process_seller_checkout` and `create_cart_checkout`) now check `pc.min_price_cents` for non-platform Discount codes before applying the discount. Cart skips the ineligible item (others may still qualify) rather than rejecting the whole cart — matches the existing scope-skip pattern.
185 -
186 - ### Payments MED — Cart-all chain-break on all-free first seller — FIXED 2026-05-31
187 - `process_seller_checkout` signature changed `Result<String>` → `Result<Option<String>>`; all-free path now returns `Ok(None)` instead of `Err(BadRequest)`. New `drain_to_paid` helper loops through the queued sellers until a paid one is reached (returns URL) or queue exhausted (returns `Ok(None)` → library redirect). Both callers (`create_cart_checkout_all` and `checkout_success`) updated to use it.
188 -
189 - ### UX MED — Item wizard `pricing_model` silent fallback — FIXED 2026-05-31
190 - `save_pricing` now rejects missing pricing_model with `AppError::validation("Select a pricing model")` and rejects unknown values with `format!("Unknown pricing model: {other}")`. Same shape as the project wizard Run #6 fix.
191 -
192 - ### UX MED — Inline-JS template duplication — FIXED 2026-05-31
193 - Added delegated `data-copy-link` click handler to `static/mnw.js` with proper `.catch()` (falls back to `window.prompt` in non-secure contexts — better than the silent-no-op the inline snippets shipped with). 8 templates migrated from `onclick="navigator.clipboard.writeText(...).then(...)"` to `<a href="..." data-copy-link>Copy link</a>` (audio_player, blog_post, collection, item, project, text_reader, user, video_player). `href` is the real URL so middle-click / no-JS / share menus still work. Cache-bust query bumped to `v=0531`.
194 -
195 - ### Perf MED — Cart free-claim N+1 — FIXED 2026-05-31
196 - Extended `CartItem` with `enable_license_keys` + `default_max_activations` (both cart queries pull them through). Three free-claim loops (single-seller paid path, discount-zeroed promo path, chain-flow path) drop the per-item `get_item_by_id` and replace per-item `remove_from_cart` DELETE with a single bulk `remove_from_cart_bulk(..., ANY($2))` at the end of each loop. Per-item tx for `claim_free_item` stays (the per-item claim-vs-already-purchased return value is load-bearing for sales-count increment). Roundtrips per free item dropped from ~5-7 to ~3-4; per-loop DELETEs from N to 1.
197 -
198 - ## Run #8 — verified standing (storage fixes from session)
199 -
200 - - **H1** (`uploads.rs::confirm_upload` L295-337) — three-arm match correct. Zero-rows arm rolls back (replace path = `try_replace_storage` swap-back with `i64::MAX` cap; fresh-upload path = `decrement_storage_used`), then `enqueue_s3_orphan(new_key)`, returns BadRequest "Item was modified concurrently." Returns BEFORE `commit_upload` and BEFORE `remove_pending_upload` — pending_uploads row left as reaper second-line defense.
201 - - **S1** (`media.rs::media_confirm` L241-293) — single `state.db.begin()` wraps storage credit + pending_uploads clear + media_files INSERT. S3 IO entirely outside tx. tx drop → Postgres ROLLBACK → all three writes reverted atomically. 23505 detection via typed `AppError::Database(sqlx::Error::Database(...))` pattern works post-rollback. S3 cleanup fires on every tx-failure branch.
202 - - **Genericization** — `pending_uploads::remove_pending_upload` and `media_files::create` now `impl PgExecutor<'e>`. All 12 callers (including `synckit/blobs.rs:157`) still compile and execute correctly.
203 - - **Pool pressure delta from S1 tx** — neutral-to-better. Prior code grabbed 3 separate conns serially; new code grabs 1 conn for ~3× the duration. Users-row write lock held ~ms. Per-user serialization for sub-second uploads acceptable.
204 -
205 - ## Run #8 — mandatory surprises
206 -
207 - - **Payments:** `compute_splits` more careful than its comment promises — remainder-distribution loop constrained by `expected_total = amount * raw_total_pct.min(100) / 100`, so under-100% splits keep the owner's share AND distribute floor-rounding remainders up to bound. Proptest-style invariant tests fully fence it.
208 - - **Storage:** `try_increment_storage_on` inside the tx holds a row-level lock on `users` for the duration of the tx. Not a bug (sub-ms hold; cap can't be over-shot via WHERE re-evaluation under READ COMMITTED). But every media confirm now serializes per-user against every other storage write.
209 - - **UX:** Copy-link button is a chimera. Nine templates copy the same inline `onclick` that calls `navigator.clipboard.writeText`, mutates `this.textContent` to `"Copied!"` — silently broken in any tab loaded over plain HTTP, in iframes, or with restrictive CSP. No `.catch()` → no fallback, no error.
210 - - **Security:** `routes/auth.rs:128-130` malformed-email branch skips DUMMY_HASH timing equalizer. ~2 orders of magnitude faster than every other failure path — distinguishes "you submitted an invalid-email-shaped string" from "valid email, unknown account." Real timing oracle a few lines above the equalizer that was deliberately added to prevent exactly this.
211 - - **Performance:** `metrics::idempotency_middleware` does a DB SELECT on EVERY POST/PUT with an `Idempotency-Key` header BEFORE the handler runs. No bloom filter, no negative cache. ~1 extra ms per POST already doing 2-5 DB queries — free 20%+ on POST p50 available by adding an in-memory `seen` set.
212 -
213 - ## Run #8 bug counts
214 -
215 - | Severity | Payments | Storage | UX | Security | Perf | Total |
216 - |---|---|---|---|---|---|---|
217 - | CRITICAL | — | — | — | — | — | **0** |
218 - | SERIOUS | 1 (deferred) | — | — | — | 1 (new) | **2** |
219 - | MED | 2 (new) | 7 | 5 | 8 | 5 | 27 |
220 - | LOW/NOTE | 5 | 3 | 4 | 3 | 2 | 17 |
221 -
222 - ## Run #8 confidence per axis
223 -
224 - - Payments **HIGH** (~70% LoC read)
225 - - Storage **HIGH** (full)
226 - - UX **HIGH**
227 - - Security **HIGH** (scoped); MEDIUM for storage-route auth side-effects
228 - - Performance **HIGH**
229 -
230 - ## Run #8 delta vs Run #7
231 -
232 - - **Storage B+ → A-.** H1 + S1 fixes verified closed. Genericization clean.
233 - - **Payments A- flat.** 2 new MEDs (cart `min_price_cents` bypass, cart-all chain-break) surfaced via expanded coverage; H2 deferred unchanged.
234 - - **UX A- flat.** 1 new MED (item-wizard `pricing_model` silent fallback) — same disease class as project wizard fix from Run #6, not propagated.
235 - - **Security A- flat.** Net improvement (username fail-closed). MED backlog identical.
236 - - **Performance A- flat.** 1 new SERIOUS (webhook unbounded spawn) — same shape as Run #4 `record_view` fix. Cart free-flow N+1 (MED) — Run #5 fix covered paid only.
237 -
238 - ---
239 -
240 - # Ultra Fuzz Report — MNW Server (Run #7 — historical)
241 -
242 - **Run date:** 2026-05-31
243 - **Run number:** 7 (+ S1 + Storage code-fuzz fixes confirmed in Run #8)
244 -
245 - ## Headline
246 -
247 - | Axis | Run #5 | Run #6 | Run #7 | Direction |
248 - |------|--------|--------|--------|-----------|
249 - | Payments | B | B+ | **A-** | ↑↑ Phase 2 + Run #6 + Run #7 fixes all landed; S1 cart 23505 swallow fixed post-Run #7; H2 claim_free_project soft race deferred |
250 - | Storage | B- | A- | **B+ → A- pending Run #8** | ↑/↓ commit_upload structural fix is excellent; Run #6 idempotency fix introduced HIGH-1 (pending_uploads leak in 4 sites) + HIGH-2 (missing rollback on update_*_url) — both fixed post-Run #7. Storage code-fuzz 2026-05-31 surfaced H1 (confirm_upload silent zero-rows + side-effects-already-fired) and reopened S1 media_confirm tx atomicity — both fixed in same session |
251 - | UX Wiring | B | A- | **A-** | ↑ field-aware deletion + parse_dollars_to_cents shared; pricing_model silent fallback HIGH found and fixed post-Run #7 |
252 - | Security | A- | A- | (unchanged) | flat — no security-touching changes in Runs #6/#7 |
253 - | Performance | B- | A- | (unchanged) | flat — no perf-touching changes in Runs #6/#7 |
254 -
255 - ## Post-Run #7 Storage code-fuzz (2026-05-31)
256 -
257 - Targeted code-fuzz scoped to the Storage axis to verify A- before triggering full Run #8. Two findings above MED, both fixed in-session:
258 -
259 - - **H1 (HIGH) — `routes/storage/uploads.rs::confirm_upload` silent `rows_affected = 0`.** Same shape as the just-closed HIGH-2 (`update_*_url`), one step further along the same handler family. UPDATE at L295 uses ownership-filter `WHERE id = $1 AND project_id IN (SELECT id FROM projects WHERE user_id = $4)`; `rows_affected()` was never checked. If the item was deleted between `get_item_owner` (L156) and the UPDATE, storage credit stayed incremented, `pending_uploads` got cleared a few lines down, and `commit_upload` enqueued a scan job against a ghost target — permanent S3 leak + over-charged counter. **Fix:** three-arm match on the UPDATE result; zero-rows case rolls back storage and routes the new S3 key through `enqueue_s3_orphan` so the reaper still cleans it, then returns BadRequest "Item was modified concurrently."
260 - - **S1 (SERIOUS, Run #5 plan #12 reopened) — `routes/storage/media.rs::media_confirm` three-write atomicity.** Run #5 called for wrapping `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction; Run #7's in-process compensation only covered in-process errors. Process interruption (panic, OOM kill, container restart) between any two writes still leaked. **Fix:** all three writes now in a single tx; tx drop rolls back storage + pending_uploads + media_files atomically. Only the S3 object needs explicit cleanup (single `delete_object` after rollback). Supporting DB-layer changes: `creator_tiers::try_increment_storage_on(&mut PgConnection)` tx-friendly variant; `pending_uploads::remove_pending_upload` and `media_files::create` signatures genericized to `impl PgExecutor<'e>` (backwards compatible).
261 -
262 - Remaining storage MED/LOW (below launchplan §1.5 A- bar; ride into Phase 4 polish or document deferral):
263 - - MED — `update_project_image_url` / `update_item_cover` ignore `rows_affected()` (same shape as H1; mitigated for current callers because the only follow-on side-effect is `bump_cache_generation`).
264 - - MED — `downloads.rs:120` `((duration as u64) * 2).max(3600)` with no DB `CHECK (duration_seconds >= 0)`. Negative duration → multi-decade presigned URL. Exploitability requires creator-controlled negative duration; ffprobe doesn't produce them. Cap in code + add CHECK migration.
265 - - MED — Admin rescan paths (`routes/admin/uploads.rs:347, 390`) call `db::scan_jobs::enqueue` directly, bypassing the `commit_upload` structural seal. Ordering is correct so no live bug; demote `db::scan_jobs::enqueue` to `pub(crate)` and expose `commit_rescan(target, ...)` to close the chronic-disease finding for real.
266 - - MED — `enqueue_s3_orphan` single-policy doc in `routes/storage/mod.rs:24-30` overstates the discipline; many `s3.delete_object(...).await.ok()` direct calls remain at pre-storage-credit rejection paths. Tighten the doc or migrate the post-storage-credit sites.
267 - - MED — `is_s3_key_live` doesn't enumerate project image URLs (project cover keys live in a distinct prefix so no current bug; surface is fragile if future code paths queue project image keys).
268 - - LOW — `scanning/worker.rs:251` inline `UPDATE media_files SET scan_status` instead of `db::scanning::update_media_file_scan_status` helper.
269 - - LOW — `routes/pages/dashboard/wizards/item/save.rs:95` `update_item_cover_image_url` updates only `cover_image_url` (not s3_key/size); client-side hidden-field abuse can desync.
270 - - LOW — `db/pending_uploads.rs::remove_pending_upload` deletes by s3_key alone (per-handler prefix validation makes cross-user collision unreachable, but the function signature is broader than it needs to be).
271 -
272 - **Chronic disease status (5th run):** The invariant-in-prose / sibling-not-swept pattern that recurred across Runs #2–#6 was **structurally addressed** in Run #7 via two helpers:
273 - - `routes/storage/mod.rs::commit_upload(target: CommitTarget, ...)` — sealed `enqueue_scan_for` to module-private; the helper is now the only handler-reachable path for scan enqueue + scan_status flip after a DB write. Bug shapes 1–3 from prior runs are now structurally impossible to introduce in a new sibling.
274 - - `crate::pricing::parse_dollars_to_cents` + `validate_dollars_f64` — canonical dollar-to-cents conversion; bypassing has historically introduced NaN→$0 and saturating-overflow silent bugs.
275 -
276 - **Net after Run #7 + S1 fix:** 0 CRITICAL · 0 HIGH/SERIOUS · 1 SERIOUS deferred (Payments H2 soft race on `claim_free_project`) · a handful of MED/LOW polish items.
277 -
278 - ---
279 -
280 - # Ultra Fuzz Report — MNW Server (Run #5 — historical)
281 -
282 - **Run date:** 2026-05-30
283 - **Run number:** 5
284 -
285 - ## Headline
286 -
287 - | Axis | Run #4 | Run #5 | Direction |
288 - |------|--------|--------|-----------|
289 - | Payments | A- | **B** | ↓ (Run #4 plan items closed; 4 new SERIOUS surfaces previously unaudited: NULL item_id refund, splits >100% overflow, tip project authorization, cart unlisted bypass) |
290 - | Storage | A- | **B-** | ↓ (Run #4 `images.rs` ordering bug closed; same disease reappeared in `uploads.rs` route gate ordering — file-type rejection runs AFTER scan enqueue) |
291 - | UX Wiring | C+ | **B** | ↑ (Run #4 CSRF patchwork + creator-tier token fixed and structurally enforced; new CRIT: field-aware validation API is dead code at template boundary) |
292 - | Security | B+ | **A-** | ↑ (Run #4 git-shell validation, lockout email flood, CSRF policy all verified; no new CRIT/HIGH; remaining gaps are operational/MED) |
293 - | Performance | B | **B-** | ↓ (Run #4 scan_jobs retention + pool permit + broadcast bounding verified; new HIGHs in previously unaudited cart checkout + page-view paths + scheduler integrity scan) |
294 -
295 - Net: 3 CRITICAL (vs Run #4: 4), 13 HIGH/SERIOUS (vs Run #4: 10), 11 MED, 9 MINOR/LOW. Two axes regressed because Run #5 reached previously-unaudited territory (Payments tip/cart/refund edges; Performance hot-path request loops) while Run #4 plan items themselves were correctly closed. The Storage regression is a *recurrence of the same shape* in a sibling handler — the chronic invariant-in-prose disease, fourth consecutive run.
296 -
297 - ## Critical / High Findings (fix before launch)
298 -
299 - 1. **[Storage — CRITICAL]** `routes/storage/uploads.rs:204-237` — `confirm_upload` calls `enqueue_scan_for(...)` and `update_item_scan_status(... Pending)` BEFORE the match arm rejects `Download`/`Insertion`/`MediaImage`/`MediaVideo` with `BadRequest`. A misrouted-but-valid `item_id` confirms flips that item's scan status to Pending, blocks `stream_url` for every fan, and leaks a scan-job row for an S3 key that's then deleted.
300 - 2. **[UX — CRITICAL]** `error.rs:216-264` + `templates/error.html` — `AppError::validation_fields(summary, [(field, msg), ...])` is consumed only by unit tests. `ErrorTemplate` has no `fields:` member; no template renders per-field highlights. Every non-HTMX validation failure degrades to the global "Go Home / Go Back" page and wipes submitted form input. Handler authors are misled into thinking their carefully-tagged field errors reach the UI.
301 - 3. **[Perf — CRITICAL]** `build_runner.rs:175-180` — Partial-failure error message reports `("{}/{} succeeded", artifact_keys.len(), artifact_keys.len() + 1)`. Denominator is always `succeeded + 1`, regardless of how many targets actually ran. Three targets, one succeeded, two failed → reports "1/2" (should be 1/3). Failed-target count is never tracked.
302 -
303 - ### HIGH / SERIOUS
304 -
305 - 4. **[Payments — SERIOUS]** `db/transactions.rs:699-716` — `refund_transaction_by_payment_intent` returns `Vec<(TransactionId, ItemId)>` (non-Optional). Project-level transactions store `item_id IS NULL` (`routes/stripe/checkout/project.rs:135`). On `charge.refunded` for a project-level purchase, sqlx fails to decode NULL → `ItemId`; webhook handler 5xx's; Stripe retries forever.
306 - 5. **[Payments — SERIOUS]** `routes/stripe/webhook/checkout_helpers.rs:240-269` — `compute_splits` comment says "Defensive clamp: a misconfigured project_members row could sum past 100%" but the loop only adds remainder pennies and never subtracts. Two members at 60%+60% on $10 each are credited $6 each — $12 of $10 of revenue. Clamp only affects `expected_total`, never the already-computed per-member amounts. Tests cover ≤100% only.
307 - 6. **[Payments — SERIOUS]** `routes/stripe/checkout/tips.rs:104-106` — `TipForm.project_id` is taken verbatim from the form. The webhook later calls `record_tip_splits(tip.id, tip.project_id, ...)` and credits THAT project's members. An attacker tipping creator A can pass project B's UUID; B's members get split obligations credited against A's tip. Stripe money flows correctly; on-platform `tip_splits` records and any downstream reporting are corrupted.
308 - 7. **[Payments — SERIOUS]** `db/cart.rs:94-123` + `routes/stripe/checkout/cart.rs` — `item.rs:47-49` enforces "Unlisted items can only be obtained through their bundle" via `if !item.listed`. `toggle_cart_preflight` and `get_cart_items` check `is_public` but NOT `listed`. An attacker who knows an unlisted item's UUID can POST to `/api/cart/{id}/toggle` and check out via the cart flow, fully bypassing the bundle-only gate.
309 - 8. **[Payments — SERIOUS]** `routes/stripe/webhook/subscriptions.rs:117-121, 67-69, 95-96` — `status_str.parse::<SubscriptionStatus>()` returns BadRequest for any status not in `enums.rs:183-198` (Stripe's `paused` is new). Webhook handler returns Err; scheduler retries forever until status changes.
310 - 9. **[Payments — SERIOUS]** `payments/webhooks.rs:294-308` — `is_full_refund` returns true when `amount_refunded >= amount` and both are zero (Stripe sometimes emits these for $0 verification charges). Triggers `refund_transaction_by_payment_intent` with default `unknown` intent ID. Test at line 517-525 pins the behavior.
311 - 10. **[Storage — HIGH]** `routes/storage/versions.rs:159-174` — `version_confirm_upload` enqueues scan and flips `scan_status` to Pending BEFORE the `version.s3_key == req.s3_key` idempotency check at line 172. Duplicate retry of an already-confirmed upload knocks a Clean version back to Pending, breaking downloads.
312 - 11. **[Storage — HIGH]** `routes/storage/images.rs:179-208` — `project_image_confirm` replace branch is gated on `Ok(Some(old_size))` from `s3.object_size(&old_key)`. On `Err` (S3 hiccup) or `Ok(None)` (URL with no object behind it) it falls into the "no old image" branch, `try_increment_storage` without decrementing. Permanent storage over-count. Also: `update_project_image_url` runs AFTER `enqueue_deletions` of the old key, with no rollback path.
313 - 12. **[Storage — HIGH]** `routes/storage/media.rs:236-293` — `media_confirm` does three separate writes (`try_increment_storage`, `remove_pending_upload`, `media_files::create`) outside a transaction. Interruption between steps leaves S3 object orphaned with storage credit consumed and no DB row.
314 - 13. **[UX — HIGH]** `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227` — `let price_cents = (price_dollars * 100.0).round() as i32; if price_cents > 0 { validate_price_cents(price_cents)?; }`. Guard skips validation for 0 and negative values; value goes through `PriceCents::from_db` (no validation) into `update_item`. Submitting `price=-5` writes `-500` cents. Same pattern on PWYW: no `min <= suggested` check.
315 - 14. **[UX — HIGH]** `routes/pages/dashboard/wizards/item/save.rs:179-183` + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298` — `price_dollars: f64 = …parse()…unwrap_or(0.0)`. `"NaN".parse::<f64>()` succeeds; `NaN as i32 == 0` (silent Free). `1e20` saturates `i32::MAX`. Bulk path catches via `PriceCents::new` cap; `save.rs` does not — persists raw.
316 - 15. **[UX — HIGH]** `routes/auth.rs:356-361` — `let is_taken = db::users::get_user_by_username(...).await.map(|u| u.is_some()).unwrap_or(false);`. Transient DB error during signup live-check returns "available", misleading the user; subsequent signup races whatever real state the DB is in.
317 - 16. **[Perf — HIGH]** `routes/stripe/checkout/cart.rs:68-248` — Per cart item: sequential `has_purchased_item`, optional `remove_from_cart`, per-free-item `begin tx → claim_free_item → increment_sales_count → commit`, `get_item_by_id`, second `remove_from_cart`. 20-item cart ≈ 80 sequential roundtrips, ~20 separate transactions, 20 distinct pool acquisitions in series.
318 - 17. **[Perf — HIGH]** `db/page_views.rs:18-32` — `record_view` spawned per public request, takes a pool connection to UPSERT. With `DB_POOL_MAX_CONNECTIONS = 25`, a viral item link spawns unbounded tasks, eats the pool, times out real request handlers at acquire. No batching, no per-(target,session) debounce.
319 - 18. **[Perf — HIGH]** `scheduler/integrity.rs:53-73` — `check_sales_count_drift`: `SELECT i.id, i.sales_count, COUNT(t.id) FROM items LEFT JOIN transactions ... GROUP BY i.id HAVING i.sales_count != COUNT(t.id) LIMIT 50`. `HAVING` post-aggregation; Postgres scans every row in `items` and joins every completed transaction in history before filtering. `LIMIT 50` doesn't cap the work. Weekly multi-minute query holding a pool connection.
320 -
321 - ## Scorecard
322 -
323 - ### Axis Summary Grades
324 -
325 - | Axis | Overall | Cold Spots | Mandatory Surprise |
326 - |------|---------|------------|--------------------|
327 - | Payments | B | `routes/stripe/checkout/cart.rs` (B-), `routes/stripe/checkout/tips.rs` (B-), `db/transactions.rs` (B-), `routes/stripe/webhook/checkout_helpers.rs` (B-), `routes/stripe/webhook/subscriptions.rs` (B) | `compute_splits` carries a "Defensive clamp" comment that explicitly anticipates the >100% case and then fails to defend against it — only `expected_total` is clamped, the already-computed per-member splits go unchanged. Treat as evidence the defensive-comment culture is itself unreliable; comments and code drift independently. |
328 - | Storage | B- | `routes/storage/uploads.rs` (C+), `routes/storage/images.rs` (C+), `routes/storage/versions.rs` (C+), `routes/storage/media.rs` (B-), `db/mod.rs::check_sandbox_cap` (C+) | `stream_url` (`downloads.rs:119-122`) computes presigned expiry as `((duration as u64) * 2).max(3600)` where `duration: i32` and no DB CHECK ≥ 0 exists on `duration_seconds`. A negative value becomes near-`u64::MAX` expiry — a centuries-long presigned URL. The cast width and missing CHECK are independent latent bugs that compose into a multi-decade credential leak. |
329 - | UX Wiring | B | `routes/pages/dashboard/wizards/item/save.rs` (B-), `error.rs` (B-), `routes/pages/public/discover.rs` (B) | `update_item` takes ~13 positional `Option`s; call sites are unreadable and error-prone. The negative-price bug (HIGH #13) is born from this signature: anyone calling it has no compiler help distinguishing `Some(-500)` (bug) from `Some(500)` (intent). |
330 - | Security | A- | `helpers.rs` (B+), `scanning/clamav.rs` (B), `scanning/yara.rs` (B), `rate_limit.rs` (B+) | The "11 layer" scan pipeline test gives a false sense of coverage. ClamAV is `FailOpen` by explicit policy (`scanning/clamav.rs:19`), YARA silently skips rule files that fail to compile (`scanning/yara.rs:54-67`), and there is no startup assertion that any real AV layer is live. A misconfigured deploy can pass EICAR as Clean while the test suite is green. |
331 - | Performance | B- | `routes/stripe/checkout/cart.rs` (C), `scheduler/announcements.rs` (C+), `scheduler/integrity.rs` (C+), `scheduler/cleanup.rs` (B-), `build_runner.rs` (B-), `db/page_views.rs` (C+), `db/pending_s3_deletions.rs` (B) | The biggest scaling cliff is a 1-line `tokio::spawn` on the page-view path, not anything that "looks expensive". Hot-path response shipped its tail-latency problem to the same pool that serves it. |
332 -
333 - ## Bug Counts by Severity
334 -
335 - | Severity | Payments | Storage | UX | Security | Perf | Total |
336 - |---|---|---|---|---|---|---|
337 - | CRITICAL | — | 1 | 1 | — | 1 | **3** |
338 - | HIGH/SERIOUS | 5 | 3 | 3 | — | 3 | **14** |
339 - | MED | 2 | 3 | 2 | 4 | 2 | 13 |
340 - | MINOR/LOW | 2 | 2 | 2 | 3 | 1 | 10 |
341 -
342 - ## Cross-Cutting Concerns
343 -
344 - 1. **Side-effects-before-validation pattern.** Storage (uploads/versions/images route gates run after scan enqueue), Payments (tip `project_id` accepted before authorization, cart `listed` not checked before checkout), UX (price `from_db` after a guard that skips zero/negative). Four files, three axes, same shape: persist first, validate later.
345 - 2. **Invariant-in-prose, fourth consecutive run.** Run #2→#3 was MaybeUser; Run #3→#4 was scan_status ordering comments-vs-code; Run #4 partial fix landed (`images.rs`) but the same disease moved up a layer to `uploads.rs` (the route-level file-type gate now runs after scan enqueue). The Payments "defensive clamp" comment in `compute_splits` is the same shape on a different organ. **No type-level constructive impossibility has yet been applied to any of these.**
346 - 3. **Optional positional args as bug carriers.** `update_item`'s ~13 positional `Option`s let the wizard pass a negative-price `Option<PriceCents::from_db>` past the validator. Same pattern is implicated in the UX field-error finding — `ErrorTemplate`'s struct literal is missing a `fields:` field at every callsite and the compiler doesn't care.
347 - 4. **Hot-path pool pressure from fire-and-forget writes.** `record_view` per pageview, `tokio::spawn` per cart line, scheduler advisory-lock conn pinned across S3. The 25-connection pool is sized for a quiet box; three independent fan-out patterns can each saturate it.
348 - 5. **FailOpen with no liveness assertion.** ClamAV FailOpen + YARA optional + no startup gate = a green test suite can coexist with zero real AV coverage. Same shape as the Performance "spawned task accumulates without bound" pattern — both are silent degradations the operator never sees.
349 -
350 - ## Components Successfully Stress-Tested
351 -
352 - - All Run #4 Phase 1 closures verified standing (CSRF creator-tier token, `images.rs` scan_status ordering structural fix, git-shell validation, lockout `=` predicate, promo dedupe, scanner streaming + pool permit, broadcast bounded fan-out, scan_jobs retention).
353 - - Stripe HMAC: multi-secret `v1=` rotation now accepts on any match (Run #4 polish landed).
354 - - Promo `try_increment_use_count` race-free via atomic single-row UPDATE; release path uses detach for no-double-decrement; proptest-covered.
355 - - License keys: 66-bit entropy, DB UNIQUE, `FOR UPDATE` activation, full recount on revoke (display lag only — finding #M).
356 - - CSRF posture: `CsrfRouter<S>` newtype prevents a bare `Router::route(path, post(...))` from compiling in mutation-bearing files. Verified.
357 - - Argon2id parameters + `DUMMY_HASH` timing equalization on user-not-found (login, OAuth, SyncKit).
358 - - PKCE-S256 pinned at both authorize and token endpoints; OAuth code atomic single-use consume.
359 - - JWT future-iat rejection + `jwt_invalidated_at` second-equal `<=` semantics; password change bumps `jwt_invalidated_at` via `update_user_password`.
360 - - SSE shard-guard drop-before-remove; cross-process advisory locks for scheduler ticks.
361 - - ZIP bomb: decompressed-bytes counted (not claimed); ratio + depth caps; nested magic-byte detection.
362 - - `try_increment_storage` cap-predicate UPDATE; concurrent uploads cannot both squeeze past cap.
363 -
364 - ## Confidence Per Axis
365 -
366 - - Payments **HIGH** — read 22 of 23 listed files end-to-end with targeted attacks per surface; all four SERIOUS reproducible by line-tracing.
367 - - Storage **HIGH** — CRITICAL and all three HIGHs mechanically reproducible; mandatory surprise composes two latent bugs via line-by-line read.
368 - - UX Wiring **HIGH** — full read of `csrf.rs`, `error.rs`, `markdown.rs`, `formatting.rs`, `validation/mod.rs`; spot-checked 20+ templates for CSRF pattern; CRITICAL field-aware-validation finding cross-checked by grepping `validation_fields_ref` callers.
369 - - Security **MEDIUM** — auth/CSRF/OAuth/scanning surfaces walked thoroughly; admin/moderation/reports/ssh_keys API/totp routes only sampled. ClamAV FailOpen is **policy** not bug; flagged as architectural risk.
370 - - Performance **MEDIUM-HIGH** — spot-checked DB call patterns across 15+ files; exhaustive route-level N+1 sweep deferred; stripe/webhook code shows similar `for x in &xs` loops at `checkout.rs:149,167,198,452` that were not deep-audited.
371 -
372 - ## Metrics
373 -
374 - - Modules audited: ~80
375 - - Cold spots (≤ B): 18
376 - - Bugs: 3 CRITICAL, 14 HIGH/SERIOUS, 13 MED, 10 MINOR/LOW
377 - - Axes at A- or above: 1/5 (Security)
378 -
379 - ## Delta Since Run #4
380 -
381 - **FIXED (Run #4 items not surfaced this run):**
382 - - All 10 Run #4 Phase 1 items verified closed (CSRF creator-tier, `images.rs` ordering, git-shell validation, lockout email flood, cancel_pending CSRF, promo dedupe, scanner streaming + pool permit, scan_jobs retention, broadcast bounding).
383 - - All 7 Run #4 Phase 2 items verified closed (cart template price math, media reupload race, pending_uploads reaper bump, TOTP step-replay, delete_other_sessions cache eviction, `/login` CSRF, OAuth fetch_optional).
384 - - All 5 Run #4 Phase 3 items verified closed (claim_pending_build partial index, build status reaper race, `extract_s3_key_from_url` host pinning, TOTP `pending_2fa` tracking row, KNOWN_SYNC_APPS removed entirely).
385 - - All Phase 4 polish items verified closed.
386 -
387 - **NEW CRITICAL/HIGH in Run #5 (previously unaudited or regressed):**
388 - - Storage: `uploads.rs` route-level file-type gate runs after scan enqueue (CRIT).
389 - - UX: `validation_fields` plumbing is dead code at template boundary (CRIT).
390 - - Perf: `build_runner.rs` partial-failure denominator nonsense (CRIT).
391 - - Payments: NULL `item_id` decode bomb on project-level refunds (SERIOUS).
392 - - Payments: `compute_splits` over-credits when project_members sum >100% (SERIOUS).
393 - - Payments: tip `project_id` not validated vs recipient (SERIOUS).
394 - - Payments: cart bypasses item `listed` gate (SERIOUS).
395 - - Payments: unknown subscription status retry storm (SERIOUS).
396 - - Storage: `version_confirm_upload` scan enqueue before idempotency check (HIGH).
397 - - Storage: `project_image_confirm` mis-accounts on S3 probe failure + no rollback (HIGH).
398 - - Storage: `media_confirm` non-atomic three-write sequence (HIGH).
399 - - UX: negative/NaN price acceptance via `PriceCents::from_db` after permissive guard (HIGH).
400 - - UX: username availability check fails open on DB error (HIGH).
401 - - Perf: cart checkout 80 sequential roundtrips (HIGH).
402 - - Perf: `record_view` unbounded spawn per public request (HIGH).
403 - - Perf: `check_sales_count_drift` full-table aggregate (HIGH).
404 -
405 - **CHRONIC (across Run #3 → Run #4 → Run #5):**
406 - - **Invariant-in-prose / policy-not-in-types — FOURTH consecutive run.** Run #4 partially fixed the scan_status ordering inside `images.rs` (and the CSRF policy via `CsrfRouter` structurally), but the same disease *moved up a layer*: in `uploads.rs` the route-level file-type gate now runs *after* scan enqueue. The constructive-impossibility shape needed: extract a `commit_upload(file_type, ...)` higher-level operation that validates the file_type before doing any scan/credit side effects, then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` so handlers cannot call them directly. The Payments `compute_splits` "Defensive clamp" comment + the UX `validation_fields_ref` orphan plumbing are the same disease in different organs.
407 -
408 - **REGRESSED:**
409 - - Payments (A- → B) — four new SERIOUS bugs surfaced in previously-unaudited tip/cart/refund/subscription-status corners. Not a regression in fixed code; a regression in audit coverage.
410 - - Storage (A- → B-) — invariant-in-prose recurrence (chronic above).
411 - - Performance (B → B-) — hot-path request loops audited for the first time.
412 -
413 - ---
414 -
415 - # Plan: Restore Every Axis to A- or Higher (Run #5)
416 -
417 - **Target grades:** Payments A · Storage A · UX A- · Security A- · Performance A-.
418 -
419 - User priority for the launch window: **resolve every CRITICAL/SERIOUS/HIGH before re-running**. Iterate until audits surface only small new errors.
420 -
421 - ## Phase 1 — CRITICAL (fix today)
422 -
423 - 1. **Storage CRIT — `uploads.rs` file-type gate ordering.** `routes/storage/uploads.rs:204-237`. Move the match arm that rejects `Download`/`Insertion`/`MediaImage`/`MediaVideo` BEFORE `enqueue_scan_for` and `update_item_scan_status`. Then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` and expose a `commit_upload(file_type, item_id, s3_key)` higher-level op that performs validation → credit → row insert → status flip in the correct order. The same constructor must serve `versions.rs` and `images.rs`. This closes the chronic invariant-in-prose finding.
424 - 2. **UX CRIT — Field-aware validation reaches the UI.** `error.rs:216-264` + `templates/error.html` + `templates/partials/form_errors.html` (new). Either (a) add `fields: Vec<(String, String)>` to `ErrorTemplate` and a `{% for f in fields %}` block in `error.html` + per-input markup; or (b) delete `validation_fields*` API entirely and replace handler callsites with `validation(summary)`. Choose (a) for non-HTMX forms that need to preserve user input; choose (b) only if every existing callsite is HTMX-only and uses OOB swaps for inline errors. Audit all `validation_fields` callers and pick a path.
425 - 3. **Perf CRIT — `build_runner.rs` partial-failure denominator.** `build_runner.rs:175-180`. Track `failed_count` alongside `artifact_keys`; report `succeeded/(succeeded+failed)`. Add a test that runs 3 targets with 2 failures and asserts "1/3" in the error string.
426 -
427 - ## Phase 2 — SERIOUS / HIGH (fix this weekend)
428 -
429 - 4. **Payments SERIOUS — NULL item_id refund decode.** `db/transactions.rs:699-716`. Change return to `Vec<(TransactionId, Option<ItemId>)>`; `refund_transaction_by_payment_intent` caller skips `decrement_sales_count`/`revoke_keys_by_transaction` when `item_id is None`. Add a fixture-based test against a project-level transaction.
430 - 5. **Payments SERIOUS — `compute_splits` over-credit.** `routes/stripe/webhook/checkout_helpers.rs:240-269`. Reject `total_split_pct > 100` at the project_members write site (DB CHECK or validation). Defensively, scale each split proportionally when sum > 100, OR clamp each split against remaining `expected_total` budget in the loop. Add a test at 60%+60%.
431 - 6. **Payments SERIOUS — Tip project authorization.** `routes/stripe/checkout/tips.rs:104-106`. After accepting `TipForm`, fetch the project and assert `project.user_id == recipient_id`; return 400 otherwise.
432 - 7. **Payments SERIOUS — Cart bypasses `listed` gate.** `db/cart.rs:94-123` and `get_cart_items`/`get_cart_items_for_seller`. Add `AND i.listed = true` to all three queries. Add a check in the per-seller checkout path. Add a regression test that toggles an unlisted item into the cart and asserts rejection.
433 - 8. **Payments SERIOUS — Unknown subscription status.** `routes/stripe/webhook/subscriptions.rs:117-121`. Replace `?` with a match: known statuses dispatch; unknown statuses `tracing::warn!` and return `StatusCode::OK` so Stripe stops retrying.
434 - 9. **Payments SERIOUS — `is_full_refund` zero-amount.** `payments/webhooks.rs:294-308`. Predicate becomes `amount > 0 && amount_refunded >= amount`. Update the test at line 517-525 to invert (zero-amount must NOT be treated as full refund).
435 - 10. **Storage HIGH — `versions.rs` enqueue-before-idempotency.** `routes/storage/versions.rs:159-174`. Move idempotency `version.s3_key == req.s3_key` check BEFORE `enqueue_scan_for`. Apply the Phase 1 `commit_upload` helper here.
436 - 11. **Storage HIGH — `project_image_confirm` probe-failure + no rollback.** `routes/storage/images.rs:179-208`. (a) On `Err` or `Ok(None)` from `s3.object_size`, fall back to the row's recorded size (add a `project_image_bytes` column if not present) rather than the "no old image" branch. (b) Move `enqueue_deletions` to AFTER `update_project_image_url` success, or wrap both in a tx with the enqueue inside.
437 - 12. **Storage HIGH — `media_confirm` non-atomic three-write.** `routes/storage/media.rs:236-293`. Wrap `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction. The storage credit refund must fire on any failure path.
438 - 13. **UX HIGH — Negative/NaN prices via `from_db`.** `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227`. Use `PriceCents::new(price_cents)?` unconditionally; drop the `> 0` guard. Add `min <= suggested` check on PWYW.
439 - 14. **UX HIGH — f64 price parsing accepts NaN.** Same file + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298`. Parse as decimal cents directly (or `Decimal::from_str_exact` from the `rust_decimal` crate already in `Cargo.lock`); reject NaN/Inf; reject negative/saturating values before cast.
440 - 15. **UX HIGH — Username live-check fails open.** `routes/auth.rs:356-361`. Propagate the DB error or treat it as "unavailable, try again" — never "available" by default.
441 - 16. **Perf HIGH — Cart checkout sequential roundtrips.** `routes/stripe/checkout/cart.rs:68-248`. Bulk-load `has_purchased_item` once with `WHERE item_id = ANY($1)`. Batch `get_item_by_id` lookups. Claim free items in a single transaction with batched inserts. Aim for ≤ 5 roundtrips for any cart size.
442 - 17. **Perf HIGH — `record_view` unbounded spawn.** `db/page_views.rs:18-32`. Replace per-request spawn with an `mpsc` channel; one background task drains every 250ms and flushes one bulk `INSERT … ON CONFLICT … DO UPDATE SET view_count = page_view_daily.view_count + EXCLUDED.view_count`.
443 - 18. **Perf HIGH — Sales drift full-table aggregate.** `scheduler/integrity.rs:53-73`. Maintain trigger-updated `transactions_completed_count` per item, or run the check off-pool against a snapshot. Short term: add `WHERE i.sales_count > 0 OR EXISTS (SELECT 1 FROM transactions WHERE item_id = i.id LIMIT 1)` to drop the LEFT JOIN's all-zero rows from the aggregate.
444 -
445 - ## Phase 3 — MED (fix before re-run if cheap)
446 -
447 - - Storage: advisory-lock leak in `check_sandbox_cap` (`db/mod.rs:92-128`) → `pg_advisory_xact_lock` or RAII guard.
448 - - Storage: `is_s3_key_live` missing tables (`db/pending_s3_deletions.rs:67-82`) → audit all s3_key-bearing columns; consider normalized `s3_objects` table.
449 - - Storage: `delete_version` owner SELECT outside tx + post-commit S3 enqueue (`db/versions.rs:267-315`) → owner SELECT inside tx; enqueue inside tx.
450 - - Security: ClamAV `FailOpen` startup assertion (`scanning/clamav.rs:19` + `scanning/mod.rs:151-164`) → refuse boot if scan configured but no AV layer live; emit `tracing::error!` after N consecutive ClamAV errors.
451 - - Security: `helpers.rs:44-50` `DefaultHasher` for advisory lock keys → stable hasher (`sha2` first 8 bytes, or `xxh3` with constant seed).
452 - - Security: OAuth `state` size cap (`routes/oauth.rs:379-386`) → reject `form.state.len() > 1024`; cap `code_challenge` at 44 base64url chars.
453 - - Security: `extract_client_ip` non-Cloudflare fallback warning (`helpers.rs:33-40`) → emit one-shot `tracing::warn!` at startup if no `CF-Connecting-IP` seen after N requests.
454 - - UX: pagination offset overflow (`routes/pages/public/discover.rs:85-87`, `routes/admin/users.rs:37-39`) → clamp `page` to `total_pages.max(1)` before arithmetic.
455 - - UX: forms render without `_csrf` when handler forgets to populate `csrf_token` → make `csrf_token` non-optional in form-bearing templates (compile-time error) or render an inline "refresh and try again" notice.
456 - - UX: `validate_username` byte-length check (`routes/auth.rs:322`) → `chars().count()`, or reorder ASCII filter before length.
457 - - Perf: scheduler advisory-lock connection pinned across S3 (`scheduler/mod.rs:92-279`) → dedicated `PgPoolOptions::new().max_connections(1)` outside the main pool.
458 - - Perf: cleanup S3 deletes serialized inside scheduler tick (`scheduler/cleanup.rs:77-100`) → `for_each_concurrent(8, ...)`; better, move user-deletion off the scheduler tick.
459 -
460 - ## Phase 4 — Polish (after re-run shows axes ≥ A-)
461 -
462 - - Payments: `has_active_subscription_to_item` period-end clause mirroring (`db/subscriptions.rs:464-470`).
463 - - Payments: `get_active_creator_tier` + `sync_user_creator_tier` period-end defense (`db/creator_tiers.rs:91-103, 181-194`).
464 - - Payments: `release_use_count` race messaging (`db/promo_codes.rs:184-200`).
465 - - Payments: License key `activation_count` recount on revoke (`db/license_keys.rs:343-382`).
466 - - Payments: Subscription minimum-charge check (`payments/checkout.rs:283-317`).
467 - - Payments: Webhook v1/v2 unmark-on-failure parity (`routes/stripe/webhook/mod.rs:48-86`).
468 - - Storage: `media_files.list_folders` scan filter (`db/media_files.rs:73-82`).
469 - - Storage: `pending_uploads.record_pending_upload` silent user-mismatch (`db/pending_uploads.rs:23-33`).
470 - - Storage: `append_log_bounded` non-atomic size cap (`build_runner.rs:516-534`).
471 - - Storage: `downloads.rs:119-122` presigned-URL expiry: cap `duration_seconds` at i64 + add DB CHECK ≥ 0.
472 - - Security: `validate_token_consuming` for OAuth POST (`routes/oauth.rs:206`).
473 - - Security: `parse_repo_path` rejects lone-dot entries (`git_ssh.rs:162`).
474 - - Security: ClamAV INSTREAM 16K cap → treat truncation as fail-closed (`scanning/clamav.rs:101-108`).
475 - - UX: validation error messages stop reflecting user input (`wizards/item/mod.rs:176-179`).
476 - - UX: CSRF body extraction stops using `from_utf8_lossy` (`csrf.rs:528-543`).
477 - - Perf: scan-pipeline 400 MiB worst-case capacity-plan note (`constants.rs:156-157`).
478 - - Perf: announcement fan-out persistence + resume (`scheduler/announcements.rs:59-89, 147-177`).
479 - - Perf: build log per-line DB roundtrip (`build_runner.rs:516-534`) → in-process running total.
480 -
481 - ## Phase 5 — Chronic (must land in Run #6 or this audit cycle has failed)
482 -
483 - **Invariant-in-prose / policy-not-in-types, fourth consecutive run.** The Phase 1 #1 fix (constructive `commit_upload` helper sealing the lower-level ops) is the only acceptable resolution. Memory notes, comments warning future authors, and renamed-helper approaches have been tried in three prior runs and recurred each time. After Phase 1 lands, audit `compute_splits` and `ErrorTemplate` for the same shape and apply the same treatment.
484 -
485 - ---
486 -
487 -
488 -
489 - ## Headline
490 -
491 - | Axis | Run #3 | Run #4 | Direction |
492 - |------|--------|--------|-----------|
493 - | Payments | A- | **A-** | flat (1 new SERIOUS: promo over-release on cart cleanup) |
494 - | Storage | B+ | **A-** | ↑ (Run #3 image-confirm rollback/race-guard fixes verified; one residual CRIT in same file) |
495 - | UX Wiring | B+ | **C+** | ↓ (CSRF policy patchwork: missing tokens + undocumented mutation in exempt prefix) |
496 - | Security | B+ | **B+** | flat (different HIGHs: git-shell repo-name validation + lockout DoS) |
497 - | Performance | B- | **B** | ↑ (Run #3 sync-FS-in-async + DashMap shard-lock + monitor split all verified; new unbounded scan_jobs/broadcast/pool-permit findings) |
498 -
499 - Net: 4 CRITICALs (vs Run #3: 2), 10 HIGH/SERIOUS (vs Run #3: 10), 22 MED, 23 MINOR/LOW. Ship-blockers are concentrated in two structural rots — CSRF policy and scan_jobs growth — not in net-new logic mistakes.
500 -
Lines truncated
@@ -1,373 +0,0 @@
1 - # MNW Server — Todo
2 -
3 - **Last updated:** 2026-05-31 late evening (post Run #9 — launch-eve pass).
4 -
5 - ## Status
6 -
7 - All 5 axes at A- after Run #9 fixes. **0 CRITICAL open · 1 SERIOUS open (deferred) · 3 HIGH open (deferred) · 7 MED open (deferred).** Launchplan §1.5 A- bar holds. See `docs/audit_review.md` Run #9 section for full triage.
8 -
9 - ## Run #9 — fixed this session (2026-05-31)
10 -
11 - - **UX-CRITICAL** Signup TOCTOU 23505 → 500 + form loss. `join_wizard.rs`: catch 23505 with constraint-name routing, surface as `return_error`. Follow-up: preserve typed form fields on error swap (Phase 4).
12 - - **Sec-SERIOUS** `delete_all_sessions_for_user` non-atomic JWT bump → wrapped in `pool.begin()` / `tx.commit()` (`db/sessions.rs:247`).
13 - - **Sec-SERIOUS** 2FA login-email IP spoofable via bare `x-forwarded-for` → swapped to `crate::helpers::extract_client_ip` (`routes/pages/public/two_factor.rs:308`).
14 - - **Pay-SERIOUS** Webhook dual-failure 503 short-circuited on Stripe retry → call `unmark_event_processed` before returning 503 (`routes/stripe/webhook/mod.rs:81`).
15 -
16 - `cargo check --tests` clean; targeted unit tests (sessions/webhook/two_factor/join_wizard) 33/33 green. Full DB-integration suite needs astra postgres.
17 -
18 - ## Run #9 — deferred with rationale (Phase 4)
19 -
20 - - [ ] **Pay-SERIOUS** Subscription webhook out-of-order events resurrect `active`. Needs `created`-timestamp re-extraction from `UntypedEvent` + `WHERE last_event_at <= $created` guards across Fan+/creator-tier/synckit subscription writes. Cross-cutting; worst case is minutes-window of restored access until next webhook.
21 - - [ ] **Sto-HIGH** Migration 129 dead-letter table never written (`cleanup.rs:453`). Operational visibility, not runtime; one-INSERT fix.
22 - - [ ] **Perf-HIGH** Per-request `reqwest::Client::new()` in 5 hot paths (dashboard/main, public/landing, api/internal/cli_features, api/domains, auth.rs). Hoist to OnceLock or AppState pooled client.
23 - - [ ] **Perf-HIGH** Unbounded `tokio::spawn` in `cleanup.rs:215-220` `spawn_expired_account_cleanups`. Lift existing `CLEANUP_PARALLELISM=4` JoinSet pattern from `cleanup_sandbox_accounts` 100 lines above.
24 - - [ ] **Pay-MED** `pricing.rs::parse_dollars_to_cents` strips European decimal comma; `1,23` → 12300¢.
25 - - [ ] **Pay-MED** SyncKit app-sub checkout silently defaults `storage_limit_bytes` to 0 if metadata missing.
26 - - [ ] **Pay-MED** Guest checkout email sentinel `"unknown@guest"` collision risk.
27 - - [ ] **Sto-MED** `is_s3_key_live` 7 EXISTS subqueries on unindexed s3_key columns — sequential scans per retry. Add partial indexes WHERE NOT NULL.
28 - - [ ] **Sto-MED** `is_s3_key_live` LIKE suffix `'%' || s3_key` false-positives on neighboring keys → S3 object leaks. Anchor with `/`.
29 - - [ ] **UX-MED** `purchase.html:145` `?return_to=` dead-wired; login handler always redirects `/dashboard`.
30 - - [ ] **UX-MED** Admin user filter buttons (`admin-users.html:35-44`) use `class="primary"` instead of `btn-primary` — renders unstyled.
31 -
32 - ## Run #9 — LOW/NOTE (carry forward)
33 -
34 - - [ ] **UX-LOW** Pagination links in `git/issues.html:72,76` don't URL-encode `search` param.
35 - - [ ] **UX-LOW** 5 sites use `.render().unwrap_or_default()` on Askama templates — blank UI on render failure, no log line.
36 - - [ ] **UX-LOW** `slugify` (`formatting.rs:85`) produces `"post"` for any non-ASCII title.
37 - - [ ] **Sec-MINOR** `csrf.rs:176-185` `validate_token_consuming` doesn't actually consume — rename or rotate.
38 - - [ ] **Sec-MINOR** `routes/oauth.rs:101-111` `is_localhost_redirect` allows any port regardless of registered URI.
39 - - [ ] **Sec-MINOR** `scanning/archive.rs:124` path-traversal check misses lone `..` segment (no trailing `/`).
40 - - [ ] **Perf-LOW** `db/page_views.rs` `pending` HashMap has no max-cardinality cap.
41 - - [ ] **Perf-LOW** `build_runner.rs:441` artifact tmpfile leaks if process crashes between SCP and `remove_file`.
42 -
43 - Live state: working tree has 104+ Run #8 files plus 4 Run #9 files (`join_wizard.rs`, `sessions.rs`, `two_factor.rs`, `webhook/mod.rs`, `docs/audit_review.md`, `todo.md`).
44 -
45 - ## Open before launch (Monday 2026-06-01)
46 -
47 - ### Platform-as-product audits (skill-driven, code-review scope; fresh context recommended)
48 - - [ ] `/creator-fuzz` — would a working creator trust this with their livelihood?
49 - - [ ] `/use-fuzz` — discoverability, learnability, first-five-minutes
50 - - [ ] `/business-fuzz` — pricing copy, fee surfacing, refund-policy wording vs actual platform behaviour
51 -
52 - ### Per-project hygiene (manual, my call when ready)
53 - - [ ] README first-screen audit — what is this / who is it for / where to get it / what does it cost. No headliner paragraphs.
54 - - [ ] `Cargo.toml` version bump for the launch deploy (pick the number; I do the edit if needed)
55 - - [ ] CHANGELOG entry for the launch version
56 -
57 - ### Monday browser/prod testing (saved for Monday per current direction)
58 - - [ ] §1.1 Walk every public page: footer present, OG/Twitter meta render correctly in Facebook + Twitter debuggers, error pages render via forced 404/403/500
59 - - [ ] §1.2 First-run creator flow end-to-end in production: signup → Stripe Connect → first item upload
60 - - [ ] §1.3 Each seeded creator's `/{handle}` page renders without empty sections; sample item per medium (audio/video/text/download) reachable from `/discover`
61 - - [ ] §1.4 Production deploy of post-fuzz build + version recorded via `record_deploy`; scheduled jobs running on prod (cleanup, scan jobs retention, build reaper, broadcast fan-out); Stripe webhook reachable from dashboard ping; backup snapshot taken pre-launch + restoration path documented in `_private/docs/mnw/server-docs/`; `/health` green
62 - - [ ] §5 launch-day sequence: final deploy, smoke-test logged-out from non-dev machine, update bios/link-in-bio/handles, confirm `maxj.phd` resolves, tag launch commit (`git tag launch-2026-06-01`)
63 -
64 - ## Open question for the user (action before Monday)
65 -
66 - - [ ] **Confirm all role-based email addresses route to real mailboxes**: `info@`, `security@`, `dmca@`, `privacy@`, `dpo@`, `legal@`, `billing@`, `policy@`, `reports@`, `community@`, `appeals@`, `press@`, `noreply@`. Legal pages (terms, privacy, copyright, appeals) and several role-routed flows reference them. If any are aspirational, that's a launch risk for the legal pages and an inbound-mail blackhole. Verify with Postmark/forwarding setup.
67 -
68 - ## Deferred with rationale (no action; documented)
69 -
70 - - [ ] `build_runner.rs:151` serial-target loop. LOW; builds run rarely; refactor touches denominator + error aggregation + log order. Post-launch.
71 - - [ ] `scheduler/mod.rs:92-279` advisory-lock per-tier granularity. Multi-replica concern; defer until multi-replica is real.
72 - - [ ] Drop unused `completion_effects` table (migration cleanup, schema-only).
73 - - [ ] Templatize founder + standard annual prices in `tiers.md` and `pricing.md` (e.g. `$86/yr`, `$130/yr`, `$194/yr`, `$324/yr`; standard `$173/$259/$389/$648`). docengine substitutions don't support arithmetic; would require adding derived `tiers.founding.basic_annual` etc keys in `shared/docengine/src/assumptions.rs`. Not blocking.
74 - - [ ] `_head_assets.html` apple-touch-icon + manifest link wiring. `static/manifest.json` exists but the `<link rel="manifest">` was reverted; bring back if/when desired.
75 - - [ ] Migrate footer's `What's new` and `Shortcuts` `<a href="#" onclick="...">` to `data-*` attributes following the `data-copy-link` pattern. UX MED, not blocking.
76 -
77 - ## What's done this session (compact summary, full details below)
78 -
79 - - **Ultra Fuzz Run #8** — all 5 axes A-. SERIOUS webhook unbounded spawn closed via new `src/background.rs` (bounded mpsc + semaphore-bounded concurrent execution). `spawn_email!` macro migrated; 17 callers + 5 manual webhook spawns + 5 same-disease per-request email spawns now route through bg queue. Run #8 5 new MEDs all closed (cart `min_price_cents`, cart-all chain-break, item-wizard `pricing_model`, inline-JS templates, cart free-claim N+1). Previously-deferred Payments H2 `claim_free_project` race closed.
80 - - **7-wave backlog sweep** — 24 of 26 carried items across auth/security/scanning/db/storage/UX/perf/payments. New schema migration `133_items_duration_seconds_nonnegative.sql`. New `commit_rescan` helper extends chronic-disease seal to admin paths. Two LOW items deferred above.
81 - - **4 cross-cutting sweeps** — `info@makenot.work` email pin (8 files), localhost/TODO/emoji/secret scans all clean.
82 - - **§1.1 public-surface code work** — OG + Twitter card meta in `base.html` (per-page overridable blocks), `static/manifest.json` created with brand colours, `error.html` drops broken back button + adds contact link, `Contact` link added to footer (mailto:info@), new `routes/pages/public/sitemap.rs` (with in-memory 10-min cache + LIKE-wildcard escape from the security review).
83 - - **Doc-fuzz** — `content-scanning.md` restructured (Malware checks + Authenticity checks sections, added URLhaus/MetaDefender/signing layers), `policy.html` See-also block linking 6 legal pages, `tiers.md` prose prices templatized via `{{ tiers.standard.* | int }}`.
84 - - **Exorcise** — 9 AI-tell removals across compare.md, content-scanning.md, appeals.md, faq.md.
85 - - **Nitpick** — 2 polish edits (dead `let _ = scan_status` removed, unused tuple-name destructure tidied).
86 - - **Security review** — 2 MEDs fixed inline: sitemap.xml in-memory cache to absorb crawler/attacker hammering; LIKE-wildcard escape on `is_s3_key_live` to prevent `_` in s3_keys from false-positive matching.
87 -
88 - ---
89 -
90 - ## Ultra Fuzz 2026-05-31 (Run #8 — final re-grade)
91 -
92 - ### Above-MED items to address before launch (or defer with rationale)
93 -
94 - ### New MED-tier findings (all closed 2026-05-31)
95 -
96 - All 5 MEDs landed. `cargo test --lib` 1654 / 0.
97 -
98 - ### Verified closed this run
99 -
100 - ### Storage A- standing — remaining MED/LOW (Phase 4 polish or defer)
101 - Carried from Storage code-fuzz 2026-05-31 — see below. All still MED, none A- blockers.
102 -
103 - ---
104 -
105 - ## Audit backlog sweep 2026-05-31 (post-Run #8, 7 waves)
106 -
107 - Sorted by file locality and difficulty. Tests: 1655 / 0 throughout.
108 -
109 - ### Wave 1 — auth/security cluster (8 tiny)
110 - ### Wave 2 — scanning (3)
111 - ### Wave 3 — DB layer polish (4)
112 - ### Wave 4 — storage handlers + admin rescan seal + downloads (5)
113 - ### Wave 5 — UX polish (2)
114 - ### Wave 6 — Performance (3 of 5; 2 deferred)
115 - - [ ] **DEFERRED** `build_runner.rs:151` serial-target loop. LOW; refactor touches denominator + error agg + log order. Post-launch.
116 - - [ ] **DEFERRED** `scheduler/mod.rs:92-279` advisory-lock granularity. Multi-replica concern; defer until multi-replica is real.
117 -
118 - ### Wave 7 — Payments LOW (2)
119 - ---
120 -
121 - ## Storage code-fuzz 2026-05-31 (post-Run #7)
122 -
123 - Targeted Storage-axis fuzz to verify A- before triggering full Run #8.
124 -
125 - ### Above-MED fixes that landed
126 - ### Remaining MED/LOW (below A- bar; defer or Phase 4 polish)
127 - - [ ] Storage MED — `update_project_image_url` / `update_item_cover` ignore `rows_affected()`. Same shape as H1 but only follow-on side-effect is `bump_cache_generation`, so blast radius is small.
128 - - [ ] Storage MED — `downloads.rs:120` `((duration as u64) * 2).max(3600)` with no DB CHECK on `duration_seconds`. Add `CHECK (duration_seconds >= 0)` migration + cap in code (`duration.max(0).saturating_mul(2).clamp(3600, 86400)`).
129 - - [ ] Storage MED — Admin rescan (`routes/admin/uploads.rs:347, 390`) bypasses `commit_upload` seal via direct `db::scan_jobs::enqueue`. Demote to `pub(crate)` and expose `commit_rescan(target, ...)`.
130 - - [ ] Storage MED — `enqueue_s3_orphan` single-policy doc overstates discipline; either tighten doc or migrate remaining direct `delete_object` cleanup sites.
131 - - [ ] Storage MED — `is_s3_key_live` doesn't enumerate project image URLs (no current bug; surface fragile).
132 - - [ ] Storage LOW — `scanning/worker.rs:251` inline UPDATE bypasses `db::scanning::update_media_file_scan_status` helper.
133 - - [ ] Storage LOW — wizard `save.rs:95` updates only `cover_image_url` (not s3_key/size).
134 - - [ ] Storage LOW — `pending_uploads::remove_pending_upload` deletes by s3_key alone (signature broader than needed).
135 -
136 - ---
137 -
138 - ## Ultra Fuzz 2026-05-31 (Runs #6, #7 + S1)
139 -
140 - ### Structural / chronic-disease fixes that landed
141 - ### Bug-level fixes that landed
142 - ### Deferred (with rationale)
143 - - [ ] Drop unused `completion_effects` table — schema-only cleanup; harmless empty table.
144 -
145 - ### Notes on remaining MED/LOW (per Run #7 axis reports)
146 - - Storage MED — admin rescan handlers (`routes/admin/uploads.rs:347, 390`) still call `enqueue_scan_for` indirectly via lower-level primitives; functional today but bypasses the chronic-disease seal.
147 - - Storage MED — `update_item_cover` / `update_project_image_url` don't check `rows_affected()`; an ownership-filter mismatch returns Ok(0 rows) silently.
148 - - Storage MED — worker inline media UPDATE at `scanning/worker.rs:251` should use the new `db::scanning::update_media_file_scan_status` helper.
149 - - Storage LOW — internal CLI confirm drops returned `FileScanStatus` (no `pending_review` surfacing).
150 - - Storage LOW — `main.rs:334` comment references now-private `enqueue_scan_for`.
151 - - UX MED — `parse_dollars_to_cents` rejects `"$5"` and `"1,000"` literally; could strip `$`/`,` for clipboard-paste UX.
152 - - UX MED — project wizard skips `validate_tier_price` ($1–$10k); API path enforces it.
153 - - UX LOW — `BundleItemIds.filter_map` silently drops malformed UUIDs.
154 - - Payments M1 — `compute_splits` should `.max(0)` per-member for defense vs legacy negative `split_percent` rows.
155 - - Payments NIT — extract `require_stripe_ready` helper; six near-identical 5-line blocks across checkout files.
156 -
157 - ---
158 -
159 - ## Ultra Fuzz 2026-05-30 (Run #5)
160 -
161 - ## Ultra Fuzz 2026-05-30 (Run #5)
162 -
163 - Full report: `docs/audit_review.md`. 3 CRITICAL, 14 HIGH/SERIOUS. Two-axis regressions (Payments B, Storage B-) are coverage expansion into previously-unaudited paths plus one chronic recurrence; Security improved to A-; all 27 Run #4 plan items verified closed.
164 -
165 - ### Phase 1 — CRITICAL (fix today)
166 -
167 - - [ ] **Storage CRIT — `uploads.rs` file-type gate ordering** — `routes/storage/uploads.rs:204-237`. Move the match-arm rejection of `Download`/`Insertion`/`MediaImage`/`MediaVideo` BEFORE `enqueue_scan_for` and `update_item_scan_status`. Then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` and expose a `commit_upload(file_type, item_id, s3_key)` higher-level op used by all three handlers (uploads / versions / images). Closes Phase 5 chronic invariant-in-prose finding.
168 - - [ ] **UX CRIT — Field-aware validation reaches the UI** — `error.rs:216-264` + `templates/error.html`. Either add `fields: Vec<(String, String)>` to `ErrorTemplate` + per-input markup in templates, OR delete the `validation_fields*` API and migrate callers to `validation(summary)`. Audit `validation_fields` callsites and pick a path.
169 - - [ ] **Perf CRIT — `build_runner.rs` partial-failure denominator** — `build_runner.rs:175-180`. Track `failed_count`; report `succeeded/(succeeded+failed)`. Add a test with 3 targets / 2 failures asserting "1/3".
170 -
171 - ### Phase 2 — SERIOUS / HIGH (fix this weekend)
172 -
173 - - [ ] **Payments SERIOUS — NULL `item_id` refund decode bomb** — `db/transactions.rs:699-716`. Return `Vec<(TransactionId, Option<ItemId>)>`; skip `decrement_sales_count`/`revoke_keys_by_transaction` when None. Fixture test against a project-level transaction.
174 - - [ ] **Payments SERIOUS — `compute_splits` over-credit on members > 100%** — `routes/stripe/webhook/checkout_helpers.rs:240-269`. Reject `total_split_pct > 100` at the project_members write site (DB CHECK + validation). Defensively scale or clamp each split. Add test at 60%+60%.
175 - - [ ] **Payments SERIOUS — Tip `project_id` not validated vs recipient** — `routes/stripe/checkout/tips.rs:104-106`. After form accept, assert `project.user_id == recipient_id`; 400 otherwise.
176 - - [ ] **Payments SERIOUS — Cart bypasses item `listed` gate** — `db/cart.rs:94-123` + `get_cart_items` + `get_cart_items_for_seller`. Add `AND i.listed = true` to all three. Add per-seller checkout path check. Regression test: toggle unlisted item into cart → rejection.
177 - - [ ] **Payments SERIOUS — Unknown subscription status retry storm** — `routes/stripe/webhook/subscriptions.rs:117-121`. Replace `?` with a match: known statuses dispatch; unknown statuses `tracing::warn!` and return 200 OK so Stripe stops retrying.
178 - - [ ] **Payments SERIOUS — `is_full_refund` zero-amount** — `payments/webhooks.rs:294-308`. Predicate becomes `amount > 0 && amount_refunded >= amount`. Invert the test at line 517-525.
179 - - [ ] **Storage HIGH — `versions.rs` enqueue-before-idempotency** — `routes/storage/versions.rs:159-174`. Move `version.s3_key == req.s3_key` idempotency check before `enqueue_scan_for`. Apply Phase 1 `commit_upload` helper.
180 - - [ ] **Storage HIGH — `project_image_confirm` probe-failure + no rollback** — `routes/storage/images.rs:179-208`. On `Err`/`Ok(None)` from `s3.object_size`, fall back to recorded size. Move `enqueue_deletions` AFTER `update_project_image_url` success, or wrap in a tx.
181 - - [ ] **Storage HIGH — `media_confirm` non-atomic three-write** — `routes/storage/media.rs:236-293`. Wrap `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction. Refund storage credit on any failure.
182 - - [ ] **UX HIGH — Negative/zero prices via `PriceCents::from_db`** — `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227`. Use `PriceCents::new(price_cents)?` unconditionally; drop `> 0` guard. Add `min <= suggested` check on PWYW.
183 - - [ ] **UX HIGH — f64 price parsing accepts NaN/saturates** — same file + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298`. Parse as decimal cents (`rust_decimal::Decimal::from_str_exact`); reject NaN/Inf/out-of-range before cast.
184 - - [ ] **UX HIGH — Username live-check fails open on DB error** — `routes/auth.rs:356-361`. Propagate error or treat as "unavailable, try again".
185 - - [ ] **Perf HIGH — Cart checkout 80 sequential roundtrips** — `routes/stripe/checkout/cart.rs:68-248`. Bulk-load `has_purchased_item` with `WHERE item_id = ANY($1)`. Batch `get_item_by_id`. Claim free items in one tx with batched inserts. Target ≤ 5 roundtrips for any cart size.
186 - - [ ] **Perf HIGH — `record_view` unbounded spawn per request** — `db/page_views.rs:18-32`. Replace per-request spawn with `mpsc` channel + single background drainer flushing every 250ms via bulk UPSERT.
187 - - [ ] **Perf HIGH — `check_sales_count_drift` full-table aggregate** — `scheduler/integrity.rs:53-73`. Add `WHERE i.sales_count > 0 OR EXISTS(SELECT 1 FROM transactions WHERE item_id = i.id LIMIT 1)` short-term; long-term trigger-maintained counts.
188 -
189 - ### Phase 3 — MED (fix before Run #6 if cheap)
190 -
191 - - [ ] Storage: advisory-lock leak in `check_sandbox_cap` (`db/mod.rs:92-128`) → `pg_advisory_xact_lock` or RAII guard.
192 - - [ ] Storage: `is_s3_key_live` missing tables (`db/pending_s3_deletions.rs:67-82`).
193 - - [ ] Storage: `delete_version` owner SELECT outside tx + post-commit S3 enqueue (`db/versions.rs:267-315`).
194 - - [ ] Security: ClamAV `FailOpen` startup assertion (`scanning/clamav.rs:19` + `scanning/mod.rs:151-164`) — refuse boot if scan configured but no AV layer live.
195 - - [ ] Security: `helpers.rs:44-50` `DefaultHasher` → stable hasher (sha2 first 8 bytes or `xxh3` constant seed).
196 - - [ ] Security: OAuth `state` size cap (`routes/oauth.rs:379-386`) — reject `> 1024`; cap `code_challenge` at 44 chars.
197 - - [ ] Security: `extract_client_ip` non-Cloudflare fallback warning (`helpers.rs:33-40`).
198 - - [ ] UX: pagination offset overflow (`routes/pages/public/discover.rs:85-87`, `routes/admin/users.rs:37-39`).
199 - - [ ] UX: forms silently render without `_csrf` when handler forgets to populate token — make `csrf_token` non-optional in form-bearing templates.
200 - - [ ] UX: `validate_username` byte-length vs `chars().count()` (`routes/auth.rs:322`).
201 - - [ ] Perf: scheduler advisory-lock connection pinned across S3 (`scheduler/mod.rs:92-279`) → dedicated `max_connections(1)` pool.
202 - - [ ] Perf: cleanup S3 deletes serialized inside scheduler tick (`scheduler/cleanup.rs:77-100`) → `for_each_concurrent(8, ...)`.
203 -
204 - ### Phase 4 — Polish (after Run #6 confirms ≥ A-)
205 -
206 - - [ ] Payments: `has_active_subscription_to_item` period-end clause mirroring (`db/subscriptions.rs:464-470`).
207 - - [ ] Payments: `get_active_creator_tier` + `sync_user_creator_tier` period-end defense (`db/creator_tiers.rs:91-103, 181-194`).
208 - - [ ] Payments: `release_use_count` race messaging (`db/promo_codes.rs:184-200`).
209 - - [ ] Payments: License key `activation_count` recount on revoke (`db/license_keys.rs:343-382`).
210 - - [ ] Payments: Subscription minimum-charge check (`payments/checkout.rs:283-317`).
211 - - [ ] Payments: Webhook v1/v2 unmark-on-failure parity (`routes/stripe/webhook/mod.rs:48-86`).
212 - - [ ] Storage: `media_files.list_folders` scan filter (`db/media_files.rs:73-82`).
213 - - [ ] Storage: `pending_uploads.record_pending_upload` silent user-mismatch (`db/pending_uploads.rs:23-33`).
214 - - [ ] Storage: `append_log_bounded` non-atomic size cap (`build_runner.rs:516-534`).
215 - - [ ] Storage: `downloads.rs:119-122` presigned-URL expiry — cap `duration_seconds` + DB CHECK ≥ 0.
216 - - [ ] Security: `validate_token_consuming` for OAuth POST (`routes/oauth.rs:206`).
217 - - [ ] Security: `parse_repo_path` rejects lone-dot entries (`git_ssh.rs:162`).
218 - - [ ] Security: ClamAV INSTREAM 16K cap → fail-closed on truncation (`scanning/clamav.rs:101-108`).
219 - - [ ] Security: TOTP seeds at rest behind an application-level key. Currently unencrypted in the DB; `tech/security.md:42-53` already discloses this and commits to a fix. A database-only compromise yields working second factors today.
220 - - [ ] AI disclosure: render the tier badge on `pages/item.html` + project page (`> [!UI] ai-tier-badges` in `about/generative-ai.md` is unfilled). Show the `ai_disclosure` text for Assisted items above the buy button so fans see it before purchase. Same badge on item cards in Discover results / search hits.
221 - - [ ] AI disclosure: pick a shape for the Discover filter — current buckets are "All / Handmade / Assisted / Generated"; `about/generative-ai.md` § "How Fans Use This" promises "Handmade only / Human-led / Everything" (Human-led = Handmade ∪ Assisted). Either rewrite the policy to match buckets, or add the combined filter.
222 - - [ ] AI disclosure: community report endpoint for misclassified items. The policy commits to fan flagging ("Fans and fellow creators can flag items they believe are misclassified.") but there's no `/report` or `/flag` route.
223 - - [ ] AI disclosure: drop the `checked` default on the publish wizard's tier radios so the creator has to pick deliberately, OR rephrase the policy's "no unlabeled option" to acknowledge default-handmade. Minor; signal-of-intent only.
224 - - [ ] UX: validation error messages stop reflecting user input (`wizards/item/mod.rs:176-179`).
225 - - [ ] UX: CSRF body extraction stops using `from_utf8_lossy` (`csrf.rs:528-543`).
226 - - [ ] Perf: scan-pipeline 400 MiB worst-case capacity note (`constants.rs:156-157`).
227 - - [ ] Perf: announcement fan-out persistence + resume (`scheduler/announcements.rs:59-89, 147-177`).
228 - - [ ] Perf: build log per-line DB roundtrip (`build_runner.rs:516-534`).
229 -
230 - ### Phase 5 — Chronic
231 -
232 - - [ ] **Invariant-in-prose, FOURTH consecutive run.** Phase 1 #1 (constructive `commit_upload` helper sealing the lower-level scan/credit/status ops) is the only acceptable resolution. After it lands, audit `compute_splits` (Payments) and `ErrorTemplate` (UX) for the same shape and apply the same treatment.
233 -
234 - ---
235 -
236 - ## Ultra Fuzz 2026-05-26 (Run #4)
237 -
238 - Full report: `docs/audit_review.md`. Plan target: lift every axis back to A- or higher (Payments A · Storage A · UX A- · Security A- · Performance A-).
239 -
240 - ### Phase 1 — clear HIGH/CRITICAL caps (must do before launch)
241 -
242 - ### Phase 2 — close axis-dragging SERIOUS items
243 -
244 - ### Phase 3 — resilience & infra hardening
245 -
246 - ### Phase 4 — polish
247 -
248 - ### Phase 5 — chronic
249 -
250 - - [~] **Invariant-in-prose / policy-not-in-types — third consecutive run (CHRONIC)** — scan_status-ordering half closed 2026-05-26 (see Phase 1 entry for `images.rs::item_image_confirm`). The constructive-impossibility shape from the chronic-remediation rubric: `commit_*_upload` is the only handler-reachable path that writes both row + scan_status; the lower-level scan_status writes were renamed `set_*_scan_status_standalone` and documented as worker- and admin-override-only. Compiler-driven migration found one additional handler with the same bug (CLI internal upload) — that's the test the rubric wants: structural change exposes drift, not human review. Remaining: `/stripe/*` CSRF policy patchwork — same disease, different organ. Track as Landing 2 below.
251 -
252 - Follow-ups:
253 - - [ ] **Manual-posture runtime assertion (dev builds).** Today `*_csrf_manual` requires no compile-time proof that the handler called `validate_token_consuming`. Only the tip handler is Manual, and `_validated` is bound only as documentation. In dev/test builds, set a flag in `validate_token_consuming` and debug-assert it after the handler runs; mismatched routes panic loudly in CI without affecting prod. Not blocking — only matters if Manual grows beyond one route.
254 - - [ ] **Phase 1 entries still open:** `cancel_pending_item_checkout` Skip reason is `"Phase 1 todo: tighten to post_csrf"` (grep "Phase 1 todo" to find). `/login` and `creator-tier` template tightening tracked separately above.
255 -
256 - ### Notes & non-actions
257 -
258 - - Status-notification fan-out cooldown across overlapping tasks (`monitor.rs:213-237`) — single-replica today; harmless. Reconsider when adding a second instance.
259 - - `record_storage_fill_stats` JOIN (`metrics.rs:181-218`) — 5min cadence is acceptable at 100k users; revisit at 1M.
260 - - `metadefender` could run concurrently with MalwareBazaar in suspicion path (`scanning/mod.rs:377-398`) — micro-optimization, deferred.
261 - - `populate_known_sync_apps` startup-only (`rate_limit.rs:65-85`) — paired with the deletion-path item above; together they're a single fix.
262 -
263 - ---
264 -
265 - ## Ultra Fuzz 2026-05-26 (Run #3)
266 -
267 - Full report: `docs/audit_review.md`. Plan target: lift every axis to A- or higher (Payments A · Storage A- · UX A · Security A- · Performance A-).
268 -
269 - ### Notes & non-actions
270 -
271 - - Backup-code fast-path malformed-hash trap (`db/totp.rs:155-189`) — log + alert + fall through to legacy path; small, file as polish.
272 - - `session_cache` TTL window vs admin revoke (`auth.rs:154-191`) — documented as intentional; consider exposing a broadcast invalidate op if operator demand emerges.
273 - - `monitor.rs` `pg_stat_activity` cadence already covered by Phase 2 split.
274 -
275 - ---
276 -
277 - ## Ultra Fuzz 2026-05-25 (Run #2)
278 -
279 - Full report: `docs/audit_review.md`.
280 -
281 - ### Outbox follow-ups — convert remaining webhook handlers
282 -
283 - All five remaining handlers converted to outbox 2026-05-25; migration 125 added `fan_plus_subscription_id` and `creator_subscription_id` parents so each subscription type has its own idempotency anchor.
284 -
285 - ### Current phase — serious / high
286 -
287 - - [ ] **Login template field-aware errors** — deferred 2026-05-26. Re-scoped: error-construction infra (`AppError::validation_fields`) is in place in `join_wizard::step_account_create`, but neither signup nor login renders per-field highlights yet. Real work is a new HTMX partial with OOB swaps per input + per-field error containers on both templates. Login itself has only one safe per-field message by design (creds are intentionally generic to avoid enumeration); the value is mostly on the signup side.
288 - - [~] **Scanning peak memory** — `scanning/mod.rs:174` already uses `std::sync::Arc::<[u8]>::from(data)` which dispatches through `Vec::into_boxed_slice` and reuses the allocation; SHA-256 streams via `Sha256::update` over the same Arc-shared buffer. No change needed.
289 - - [~] **`check_sales_count_drift` full GROUP BY** — the SQL already filters via `HAVING i.sales_count != COUNT(t.id)` (the real bound). The trailing `LIMIT 50` is a per-tick cap on how many drifts to surface, not a cosmetic post-group filter. No action.
290 -
291 - ### Current phase — medium / minor
292 -
293 - - [~] **`pg_stat_activity` baseline load** — `monitor.rs:290-294` doc explicitly justifies the 30 s cadence for operator-dashboard refresh; no change.
294 -
295 - ### Deferred — architectural
296 -
297 - - [ ] **Cloudflare-only origin: migrate custom domains to CF for SaaS, then firewall 80/443.** Re-scoped 2026-05-26. The original sketch (firewall the origin to CF IP ranges) conflicts with the shipped custom-domain feature (`api/domains.rs` + Caddy `on_demand_tls`), which expects creators' A-records to hit the origin directly. The two threats the firewall was meant to close are already mitigated at layer 7 — `CloudflareIpKeyExtractor` peer-IP fallback (landed 2026-05-26) closes the CF-Connecting-IP spoofing surface; Caddy `client_auth require_and_verify` closes the WAF-bypass surface for `makenot.work`. The proper sequencing is now (1) upgrade to CF Business for CF-for-SaaS, (2) reconfigure CF dashboard with a fallback origin, (3) update `api/domains.rs` onboarding to CNAME instead of A-record, (4) migrate the 1 live custom-domain creator, (5) drop `on_demand_tls` from `Caddyfile`, (6) apply the firewall ACL. Full sequence + ACL sketch + gotchas live in `_meta/docs/incident_response.md` § "Pending Hardening: Cloudflare-only origin firewall". Blocked on the CF plan upgrade + 1 customer email, neither of which happens in-session.
298 -
299 - ---
300 -
301 - ## Ultra Fuzz 2026-05-24 (Run #1)
302 -
303 - Full report: `docs/audit_review.md`.
304 -
305 - ### Current phase — medium
306 -
307 - - [~] **Status notification parallel fan-out** — kept sequential with 100 ms shaper; that pacing is intentional (SMTP rate-limit shape). No change.
308 -
309 - ### Deferred — architectural
310 -
311 - All four Run #1 deferred items closed by Run #2 sweeps; pointers below.
312 -
313 - ---
314 -
315 - ## Creator applications restructure (replaces waitlist)
316 -
317 - Discussed and scoped 2026-06-03; no implementation yet. Rename and generalize the existing waitlist into a creator-applications system that lives inside the join wizard, replaces the standalone `/admin/waitlist` surface, and gives fans a settings-page path to apply after the fact. The trigger to start: when the founder cohort fill is no longer well-served by the current waitlist UX, or before opening signup beyond hand-picked invitations — whichever comes first.
318 -
319 - ### Model
320 -
321 - Three branches in the wizard, decided up front by the signing-up account:
322 -
323 - - **Free trial** — short pitch (1–2 sentences: what you make, why MNW). Account exists, `can_create_projects = false`, application status `pending`. Operator approval flips it.
324 - - **Benefits account** — longer disclosure (community / mission alignment, the binding-mission-statement framing from the program docs). Same `pending` state, different `application_type` so the admin queue can sort them.
325 - - **Just pay** — skip the application entirely, route to Stripe checkout. **No approval required** — paying is the signal. On subscription activation, `can_create_projects = true` immediately. No `creator_applications` row written.
326 -
327 - Founder rate (50% off, locked for life when window closes) is available on **any** branch during the cohort window — free-trial seats get $0, benefits seats may be subsidized, paid seats get the standard founder discount.
328 -
329 - ### Schema migration
330 -
331 - - [ ] Replace `creator_waitlist` with `creator_applications`. Either rename the table + add columns, or create a sibling and migrate rows. Add `application_type` enum column (`free_trial` | `benefits_account`). Normalize `status` values to `pending` | `approved` | `declined` | `spam`.
332 - - [ ] Backfill existing waitlist rows as `application_type = 'free_trial'` (preserve `pitch`, `created_at`, decision metadata, `selection_method`, `invited_by_user_id`).
333 - - [ ] Drop the `db::waitlist` module + its consumers once nothing references it. The `grant_creator_access` helper is the right primitive to keep — move it under `db::creator_applications`.
334 -
335 - ### Wizard
336 -
337 - - [ ] Insert a new "Choose your entry" step in `join_wizard.rs` flow after profile, before pitch. Three radio options + short descriptions. The chosen branch threads through to the next step.
338 - - [ ] Rebuild `wizards/steps/join/pitch.html` to branch on `application_type` — different prompt text, different length limits (free-trial short, benefits longer).
339 - - [ ] Route the paid branch around the application step entirely: profile → Stripe checkout → complete. On webhook activation, no `creator_applications` row is written; `can_create_projects` is granted on `creator_subscriptions.status = 'active'`.
340 - - [ ] Rewrite `wizards/steps/join/complete.html` so the "Apply for creator access" framing is gone (the question was already answered upstream). Free-trial / benefits accounts see "Application under review"; paid accounts see "Welcome — create your first project."
341 -
342 - ### Dashboard / settings
343 -
344 - - [ ] New `/settings/creator-access` page for fan-only accounts to submit an application after the fact. Same three branches, same pitch requirements. Lives in the dashboard tab rail, not a marketing page.
345 - - [ ] Strip the five existing "Apply for Creator Access" CTAs (`partials/tabs/user_projects.html` x2, `partials/tabs/user_creator.html` heading, `pages/creators.html` step list, `wizards/steps/join/complete.html` card). Replace dashboard surfaces with a small "Apply for creator access" link that routes to `/settings/creator-access`. Marketing page (`creators.html`) drops the "apply from your dashboard" framing in favor of "start your free trial during signup."
346 -
347 - ### Pending UX
348 -
349 - - [ ] Accounts in `pending` status can browse, buy items as a fan, manage profile and settings — they just can't reach creator dashboards. Existing `can_create_projects` guards already block project creation; new behavior is to render an "Application under review" panel (with submitted pitch + submission date) instead of returning 404 or redirecting away.
350 - - [ ] Email notification on approve / decline, distinct templates per `application_type`. Decline template names the reason; approval template points at the dashboard.
351 -
352 - ### Admin
353 -
354 - - [ ] Rename `routes/admin/waitlist.rs` → `routes/admin/applications.rs`. Generalize the approve / decline / spam handlers to read `application_type`. The `grant_creator_access` call on approve stays as-is.
355 - - [ ] Rename `dashboards/admin-waitlist.html` → `dashboards/admin-applications.html`. Add an `application_type` column and a type filter (free_trial / benefits_account / all). The existing stats block (pending / approved / spam / total_creators counts) stays; queries adjust to read the new table.
356 - - [ ] Update admin navigation (`admin_active_page: "waitlist"` → `"applications"`) and any cross-links in the admin shell.
357 - - [ ] Sitemap entry update, breadcrumb update.
358 -
359 - ### Tests + acceptance
360 -
361 - - [ ] Pin: a `pending` account cannot create projects (the existing `can_create_projects` guard already enforces this; verify the rendered panel works).
362 - - [ ] Pin: a "just pay" signup lands at `can_create_projects = true` AND an active Stripe subscription AND **no** `creator_applications` row.
363 - - [ ] Pin: admin queue lists pending applications sorted by `application_type` then `created_at`.
364 - - [ ] Pin: every removed "Apply for Creator Access" string is gone from `templates/` (grep test in `tests/regression/`).
365 - - [ ] Pin: founder rate applies regardless of branch during the cohort window (existing founder-pricing logic; new test asserts the multiplier doesn't depend on `application_type`).
366 -
367 - ### Out of scope (this restructure)
368 -
369 - - Partnership / sponsorship / residency / fellow-led-project applications — those continue via email, no form surface built. If we ever build them, the same `creator_applications` table can host them as additional `application_type` variants.
370 -
371 - ## PoM contract guard (landed 2026-05-25)
372 -
373 - Schema-drift guard test wired against `shared/pom-contract/`: `src/routes/pages/public/health/mod.rs::tests::pom_hetzner_health_expectations_resolve`. `health_json` body builder extracted as pure `health_json_body(overall, db_ok)` for the test. Catches the v0.5.16-class drift where a field is removed from `/api/health` without updating PoM's expectations. See `MNW/CLAUDE.md` § PoM Health Contract.
@@ -1,31 +0,0 @@
1 - # theme-common TODO
2 -
3 - ## Active
4 -
5 - (none — crate is stable, consumed by audiofiles + goingson)
6 -
7 - ## Future — Unified theme library + per-user MNW theming
8 -
9 - **Scope:** One canonical theme library serving every app under Make Creative (audiofiles, goingson, balanced_breakfast, MNW server-rendered UI), plus a Fan+ perk that lets MNW users override the platform default CSS per-account.
10 -
11 - **Why:**
12 - - Today every app maintains its own theme folder: `Apps/audiofiles/crates/audiofiles-browser/themes/` (28 themes), `Apps/goingson/src-tauri/frontend/themes/helix/` (9 themes), `MNW/shared/themes/` (28 themes). Drift is inevitable; theme counts are already wrong in 5+ docs.
13 - - A unified library shipped from `MNW/shared/themes/` (or a successor crate here in `theme-common/`) would centralize the source of truth and let documentation point at one directory.
14 - - Theming the MNW default CSS per user is a natural Fan+ benefit: pay tier gets a theme picker + custom CSS slot stored against the account, applied via `<link rel="stylesheet">` injected into every authenticated page render.
15 -
16 - **Pieces to design:**
17 - - Single canonical theme TOML schema (audit existing helix-format vs MNW theme TOMLs for shape divergence).
18 - - A loader contract every app already supports (theme-common already does this for native apps; MNW server-side rendering needs an equivalent for CSS variable injection into Askama templates).
19 - - Per-user storage: new column on the users table (or a separate `user_themes` table) holding either a theme ID + custom CSS overrides, or a full custom theme blob (with size cap and validation).
20 - - Fan+ gating: theme picker visible to all signed-in users (apply a built-in theme); custom CSS slot gated behind Fan+ status.
21 - - CSS sanitization for the custom-CSS slot — accept only declarations, no `@import`, no `url()` to off-origin, no `expression()` (defunct but defensive). Probably easier to whitelist a CSS property allowlist than to sanitize freeform.
22 - - Migration path: drop hard theme counts from app READMEs in favor of "see the themes directory" (already done for the launch).
23 -
24 - **Not Monday work.** Surfaced here so it's tracked. The Monday docs drop hard counts and point at directories — that posture is already correct for whenever this lands.
25 -
26 - **Key paths:**
27 - - `MNW/shared/theme-common/` (this crate — likely host for the canonical loader)
28 - - `MNW/shared/themes/` (current cross-app theme store)
29 - - `Apps/audiofiles/crates/audiofiles-browser/themes/` (consumer; would migrate)
30 - - `Apps/goingson/src-tauri/frontend/themes/helix/` (consumer; would migrate)
31 - - `MNW/server/templates/` (would gain user-CSS injection hook)