max / makenotwork
5 files changed,
+4 insertions,
-1146 deletions
| @@ -46,3 +46,7 @@ mutants.out* | |||
| 46 | 46 | ||
| 47 | 47 | # Claude Code instructions (project-local; not for the public repo) | |
| 48 | 48 | CLAUDE.md | |
| 49 | + | ||
| 50 | + | # Private working files — live in _private/, synced via Syncthing | |
| 51 | + | todo.md | |
| 52 | + | audit_review.md |
| @@ -1,242 +0,0 @@ | |||
| 1 | - | # Sando TODO | |
| 2 | - | ||
| 3 | - | Open work only. Completed items move to `todo_done.md` (sibling file) when one exists. Design notes go in `plans/<name>.md`, not folded into checkboxes. | |
| 4 | - | ||
| 5 | - | Format rule: every actionable line is a `- [ ]` checkbox. Headings group phases and themes; do not put status updates in them. | |
| 6 | - | ||
| 7 | - | ## Resume here (next session) | |
| 8 | - | ||
| 9 | - | User-blocking before anything else: | |
| 10 | - | ||
| 11 | - | - [ ] Apply updated Tailscale ACL (`_private/infra/tailscale-acl-policy.json`) at https://login.tailscale.com/admin/acls — adds the `tag:server → tag:server as user max` SSH rule needed for offsite backup sync. Once live, Claude can finish: verify `makenotwork@alpha-west-1 → max@astra` ssh works, scp `MNW/server/deploy/sync-backup-offsite.sh` to `/opt/makenotwork/sync-backup-offsite.sh`, chmod +x, then trigger `sudo -u makenotwork /opt/makenotwork/backup-db.sh` and confirm a file lands in `max@astra:/opt/backups/mnw/`. Closes the "offsite broken" adjacent fire. | |
| 12 | - | ||
| 13 | - | Claude-only follow-ups (no user input needed; pick the next slice): | |
| 14 | - | ||
| 15 | - | - error-pages bake-into-binary via `include_dir!` (separate MNW PR) — closes Phase 3 §2 long-term | |
| 16 | - | - `cargo_test` gate red on MNW (Phase 0 follow-up) — diagnose, likely needs DB/env setup hook per test or `--test-threads=1` | |
| 17 | - | - Sandod build/test output streaming (Phase 0 follow-up) — pipe stdout to per-run log files instead of `Output` buffer; surface in WS `/events` | |
| 18 | - | - Phase 6 monitoring + alerting — Prometheus counters + alert rules | |
| 19 | - | - Phase 4 prep — first Sando-only deploy to testnot (needs Track B — see below) | |
| 20 | - | - Sando test suite — see "Testing" section below; sandod and TUI have zero unit/integration tests today | |
| 21 | - | ||
| 22 | - | Session 5 — 0.9.7 launched 2026-06-03 via Sando through host → A → B (hotfix=true, skip-burn-in). Soak cleanup closed (launchplan_final §1). Remaining: | |
| 23 | - | ||
| 24 | - | - [x] **Soak cleanup eligible 2026-06-10 — shortened and shipped 2026-06-03.** Gate verified clean since the 06-03 02:53 migration boot. Removed `/opt/git` (99M, stale duplicate of `/var/lib/mnw/git`), `/opt/makenotwork` (177M, post-yara-relocation), `/opt/backups` (277M, root pg_backup output). 553M reclaimed. yara-rules relocated from `/opt/makenotwork/yara-rules` → real `/opt/mnw/yara-rules` (733 rules compiled fine from new path). | |
| 25 | - | - [x] **Backups rebuilt under `/var/lib/mnw/backups/<db>/`** (makenotwork + multithreaded, per-DB subdirs), per-user crons (03:00 + 03:05), offsite to astra `/opt/backups/mnw/<db>/` via Tailscale SSH `tag:prod → max@tag:testing` rule. `backup-puller` rrsync re-rooted at `/var/lib/mnw/backups`; sando `backup.source` updated to `ssh://backup-puller@alpha-west-1:2200/makenotwork/latest.sql.gz`; `/backup/fetch` verified 38MB matched prod size. | |
| 26 | - | - [x] **Pre-existing meta.git ownership drift fixed inline** — `mnw-cli:git` → `git:git` (tightened `safe.directory` was rejecting it). Surfaced by post-rm ls-remote regression test. | |
| 27 | - | - [ ] **Remove live drop-in** `/etc/systemd/system/mnw-cli.service.d/fhs-git-path.conf` on prod. The unit file in `mnw-cli/deploy/mnw-cli.service` is patched to include `ReadWritePaths=/var/lib/mnw`, so the drop-in becomes redundant next time `./mnw-cli/deploy/deploy.sh --config` runs. Until then both apply (harmless dupe). | |
| 28 | - | ||
| 29 | - | Decision-gated (needs user input first): | |
| 30 | - | ||
| 31 | - | - Track B testnot live-app: postgres role+db (Claude), `.env` secrets (which Stripe/SMTP/S3 creds to use for staging — needs user), Caddyfile + Cloudflare Origin CA cert for testnot.work (user issues cert in CF dashboard; Claude installs) | |
| 32 | - | - Restart-warning hook for prod tier (Phase 5) — needs `CLI_SERVICE_TOKEN` accessible to sandod | |
| 33 | - | ||
| 34 | - | ||
| 35 | - | ||
| 36 | - | ## Testing | |
| 37 | - | ||
| 38 | - | Sando has zero automated tests today — daemon + TUI have been validated by running real scenarios end-to-end. Worth a pass before relying on it for prod cutover. | |
| 39 | - | ||
| 40 | - | ### TUI hands-on (Phase 5 acceptance — run interactively) | |
| 41 | - | ||
| 42 | - | - [ ] Launches against `SANDO_DAEMON=http://100.103.89.95:7766` without crashing; header shows daemon URL. | |
| 43 | - | - [ ] WS status: `ws ok` appears in the header within ~1s of launch (sandod is reachable). | |
| 44 | - | - [ ] WS reconnects: `sudo systemctl restart sandod` on fw13; header flips `ws ok → ws ... → ws ok` within ~5s. Events resume. | |
| 45 | - | - [ ] `↑/↓` and `j/k` move the row highlight through all 4 tiers; selection persists across the 2s state refresh. | |
| 46 | - | - [ ] `b` triggers backup fetch: status bar shows `[ok] backup/fetch: ...`, events log gets a `backup_fetched` line a moment later. | |
| 47 | - | - [ ] `c` on tier `a` (which has `current_version=0.8.12`) records a manual_confirm; event appears. | |
| 48 | - | - [ ] `c` on tier `mm` (no current_version) returns an HTTP error; status bar shows `[err]`. | |
| 49 | - | - [ ] `p` on tier `a` (assuming gates pass) issues a real deploy; sequence of `deploy_start → deploy_ok → promote_complete` events appears. | |
| 50 | - | - [ ] `R` on tier `a` rolls back to `previous_version`; `rollback` event appears. Reverse with `p` again. | |
| 51 | - | - [ ] `q`, `Esc`, `Ctrl-C` all quit cleanly; terminal restores correctly (no leftover raw mode). | |
| 52 | - | - [ ] Events ring buffer trims to 200: trigger ≥200 events (loop /backup/fetch), confirm the oldest scroll out, no panic. | |
| 53 | - | - [ ] Action while disconnected: kill sandod, hit `b`. Status shows error, TUI stays responsive. | |
| 54 | - | ||
| 55 | - | ### Sandod unit + integration tests (Claude-only) | |
| 56 | - | ||
| 57 | - | 55 tests passing as of 2026-05-31 (14 TUI + 41 daemon). Remaining gaps: | |
| 58 | - | ||
| 59 | - | - [x] `gates::reset_scratch` — verifies dropping every non-system schema (planted `foo` + `tower_sessions`, ran reset, asserted only `public` remains). Gated by `SANDO_TEST_PG_URL` env var so it skips on hosts without postgres. Run on fw13 with `SANDO_TEST_PG_URL=postgres:///sando_scratch?host=/var/run/postgresql cargo test`. | |
| 60 | - | - [x] `deploy::deploy_local` — copies multiple binaries (`PRIMARY`/`ADMIN`), swaps symlink atomically across two consecutive deploys, gc_local_releases keeps last N by mtime + handles missing dir + noop under threshold. `sh_quote` round-trip. | |
| 61 | - | - [x] `deploy::deploy_remote` failure path — against unroutable `192.0.2.1`, verifies clean ssh-attributed error (no panic / hang); ConnectTimeout bounds the test wallclock to ~10s. Plus `deploy_node` with `ssh_target="local"` short-circuits to symlink swap. | |
| 62 | - | - [x] `backup::fetch` URL parsing — extracted `parse_source` → `BackupSource` enum. 10 tests: file://, rsync://, ssh:// with/without port, multi-segment ssh path, non-numeric `:foo` colon treated as part of host (not port), and all malformed-input rejections (empty, scheme-only, ftp, no path on ssh, empty user@host). | |
| 63 | - | - [x] `events::emit` no-subscribers no-op; `emit_reaches_a_subscriber`; envelope serializes with flat `kind` field (locks the WS/TUI contract); `lagged_subscriber_observes_recv_error_lagged` exercises broadcast capacity. | |
| 64 | - | - [ ] `events_ws` handler end-to-end — drive WS through a slow client, assert `{"kind":"lagged",...}` frame arrives. Possible (bind axum to ephemeral port + tungstenite client) but the bus-level lag detection is already locked in by `lagged_subscriber_observes_recv_error_lagged`. Diminishing returns vs effort. Deferred. | |
| 65 | - | - [ ] `build` mutex behavior — requires real cargo or a slow stub. Treated as a manual checklist item under "TUI hands-on" instead. (Already validated by hand 2026-05-31.) | |
| 66 | - | - [x] `routes::confirm` — rejects when tier has no `current_version` (409 Conflict — surfaced that GateBlocked maps to 409 not 400, locked in), accepts + inserts a passing gate_runs row when set, 404 on unknown tier. | |
| 67 | - | - [x] `routes::promote` — refuses promote-to-first-tier (409), errors when neither body nor predecessor has a version, 404 when explicit version's `versions` row is missing. | |
| 68 | - | - [x] `unsatisfied_gates` — 6 tests: empty, failed-kind flagging, latest-row-wins (red→green flap clears), hotfix skips burn_in only, ignores other tiers/versions, **null `passed` treated as failing** (locks the in-flight-race safety property). | |
| 69 | - | - [x] `run_migrator` errors on missing migrations dir. | |
| 70 | - | - [x] sqlx migrations exercised via existing `sync` tests. | |
| 71 | - | ||
| 72 | - | ### End-to-end harness | |
| 73 | - | ||
| 74 | - | - [ ] Single-binary smoke: spin up sandod against tmpdir config + a tmp postgres; push a fixture commit; assert the full pipeline (build → gates → MM tier_state advance) completes in under 30s. Run on CI for every sando PR. | |
| 75 | - | - [ ] Pre-cutover dry run: stand up a throwaway tier-B node, point production-shape config at it, run `cargo_test → migration_dry_run → boot_smoke → promote` end to end. Use existing testnot for this once Track B is done. | |
| 76 | - | ||
| 77 | - | ### TUI unit tests | |
| 78 | - | ||
| 79 | - | - [x] `format_event` — golden tests for build_ok, gate_done (pass+fail), backup_fetched, deploy_failed, unknown kind, malformed JSON. | |
| 80 | - | - [x] `ws_url_from`: `http://` → `ws://`, `https://` → `wss://`, only replaces scheme once, unknown scheme passes through. | |
| 81 | - | - [x] `Action::Display` impl produces `backup/fetch`, `promote/<tier>`, etc. | |
| 82 | - | - [x] `Shared::push_event` ring-buffer cap at 200; oldest entries drop in FIFO order. | |
| 83 | - | - [x] `truncate` short-string passthrough vs long-string ellipsis. | |
| 84 | - | ||
| 85 | - | --- | |
| 86 | - | ||
| 87 | - | Roadmap target: replace `server/deploy/deploy.sh` and astra-hosted `server/deploy/run-ci.sh` with Sando running on **fw13**, gating Hetzner prod through testnot.work. | |
| 88 | - | ||
| 89 | - | **Host decision:** Sando runs on fw13 (x86_64 Ubuntu-derived, systemd). Architecturally closest to Hetzner prod, no cross-compile, no init-system split. MakeMachine and EveryCycle are now a separate project — not Sando's concern. | |
| 90 | - | ||
| 91 | - | Phases are ordered for execution. Phase 0 must finish before Phase 1 is meaningful. Phases 5+ are post-cutover hardening. | |
| 92 | - | ||
| 93 | - | ## Key Paths | |
| 94 | - | ||
| 95 | - | Read these to orient before working on Sando: | |
| 96 | - | ||
| 97 | - | - `README.md` — quickstart, API surface, v0 limitations | |
| 98 | - | - `sando.toml` — current topology (host → A → B; C declared, not provisioned) | |
| 99 | - | - `daemon/src/main.rs` — startup sequence (config → topology → migrate → sync → bare-repo bootstrap → serve) | |
| 100 | - | - `daemon/src/routes.rs` — `/state`, `/promote`, `/rollback`, `/rebuild`, `/backup/fetch`, `/events` | |
| 101 | - | - `daemon/src/gates.rs` — gate runners; the load-bearing logic | |
| 102 | - | - `daemon/src/build.rs` — host-tier build pipeline | |
| 103 | - | - `daemon/src/deploy.rs` — `deploy_local`; remote SSH stub | |
| 104 | - | - `daemon/migrations/001_init.sql` — schema (tiers/nodes as rows) | |
| 105 | - | - `server/deploy/deploy.sh` — current cross-compile + push-to-Hetzner script (what we are replacing) | |
| 106 | - | - `server/deploy/run-ci.sh` — current astra CI script (what we are replacing) | |
| 107 | - | - `_meta/docs/operations.md` — burn-in rule and hotfix policy that gates encode | |
| 108 | - | ||
| 109 | - | --- | |
| 110 | - | ||
| 111 | - | ## Phase 0 — fw13 bootstrap | |
| 112 | - | ||
| 113 | - | - [x] Provision `sando` system user on fw13; lock down home dir; generate SSH keypair at `/srv/sando/.ssh/id_ed25519` for outbound deploys. | |
| 114 | - | - [x] Install scratch Postgres locally on fw13; create `sando_scratch` role + DB used by `migration_dry_run`. (Owner of own DB; non-superuser.) | |
| 115 | - | - [x] Write systemd unit for `sandod` (long-run service, restart on failure, env from `/etc/sando/sando.env`). Installed at `/etc/systemd/system/sandod.service`. | |
| 116 | - | - [x] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`. Installed at `/etc/sando/sando.toml`; daemon config at `/etc/sando/sando-daemon.toml`. | |
| 117 | - | - [x] Install `sandod` binary at `/usr/local/bin/sandod`; enable + start the service. Live on `100.103.89.95:7766`; bare repo auto-bootstrapped at `/srv/sando/mnw.git`. | |
| 118 | - | - [x] Verify MNW server builds reproducibly on fw13. `makenotwork` 0.8.12 built in 132s; sqlx online mode against `sando_scratch` postgres (sandod prep-resets all non-system schemas + applies all 133 MNW migrations before invoking cargo). | |
| 119 | - | - [ ] Register sando pubkey with Hetzner prod (`deploy@alpha-west-1`) and testnot.work once that node exists. Pubkey: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEK+vhpr1V8VnsEemN9x6tAA2S05kmv/mQ3eVgSXSkJ8 sando@fw13`. (Moved to Phase 1 — not blocking Phase 0 exit.) | |
| 120 | - | ||
| 121 | - | ### Phase 0 follow-ups (not blocking, but visible) | |
| 122 | - | ||
| 123 | - | - [ ] `cargo_test` gate fails on MNW today — beyond the sqlx-online fix (already in), tests likely need a separate prepared DB (or per-test isolation). Investigate when wiring up Phase 1 gates. | |
| 124 | - | - [ ] Sandod observability: add `WS /events` (Phase 5) and consider streaming build/test stdout to a per-run log file rather than buffering in `Output`. | |
| 125 | - | - [ ] sqlx-cli (`v0.9.0`) at `/srv/sando/.cargo/bin/sqlx` is installed for the sando user but unused — sandod uses `sqlx::migrate::Migrator` programmatically (v0.8.6). Decide later whether to drop sqlx-cli or use it for diagnostics. | |
| 126 | - | - [ ] fw13 WoL: `ethtool` shows no wake-on capability on the USB ethernet — WoL likely won't work; rely on manual wake or BIOS settings. Record in `_meta/` if a solution surfaces. | |
| 127 | - | ||
| 128 | - | ## Phase 1 — Remote deploy | |
| 129 | - | ||
| 130 | - | The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync. | |
| 131 | - | ||
| 132 | - | - [x] Implement `deploy::deploy_node` remote path: rsync staged binary to `<ssh_target>:<release_root>/releases/<version>/<bin_name>`, then `ssh <ssh_target>` does `mv -Tf` symlink swap + `sudo systemctl reload-or-restart <service>`. First real promote landed 2026-05-31: fw13 → testnot, version 0.8.12. | |
| 133 | - | - [x] Add `node.service_name` to `sando.toml` (default `makenotwork.service`). | |
| 134 | - | - [x] Bootstrap script for adding a fresh node: `MNW/sando/deploy/bootstrap-node.sh`. (See Phase 3 — node-bootstrap script for full details.) | |
| 135 | - | - [x] Garbage-collect old releases on the remote: keep last N=5 per node, sorted by mtime. Runs at end of each successful deploy (local + remote variants). Tied via `RELEASES_TO_KEEP` const. | |
| 136 | - | - [x] Handle `rsync` failure mid-deploy: leave the previous `current` symlink intact; mark `deploys.outcome = 'failed'`; do not advance `tier_state`. (Verified the routes.rs path; rsync runs before symlink swap so failure naturally leaves `current` untouched.) | |
| 137 | - | ||
| 138 | - | ### Phase 1 — Track B: testnot live-app setup (NOT blocking Phase 2) | |
| 139 | - | ||
| 140 | - | Sando's deploy machinery is done, but testnot's MNW runtime needs the rest before its `makenotwork.service` can stay up: | |
| 141 | - | ||
| 142 | - | - [ ] Provision `makenotwork` postgres role + db on testnot (postgres-18 already installed). | |
| 143 | - | - [ ] `/opt/mnw/.env` with staging Stripe keys, SMTP, S3, DATABASE_URL, all other MNW env. Decide which subset of integrations get test/sandbox credentials vs are stubbed. | |
| 144 | - | - [ ] Caddyfile for testnot.work — strip prod's blocks down to just the main reverse_proxy (and forums/cdn if needed). Cloudflare Origin CA cert for testnot.work issued + placed at `/etc/caddy/`. AOP CA already universal. | |
| 145 | - | - [ ] `error-pages/` for testnot (copy or symlink from a release dir). | |
| 146 | - | - [ ] Wire post-deploy smoke check (`curl https://testnot.work/health` after the symlink swap, before declaring deploy ok). Sando-side, gate-like; spec in Phase 2 boot_smoke wording. | |
| 147 | - | ||
| 148 | - | ## Phase 2 — Backup pipeline + migration dry-run | |
| 149 | - | ||
| 150 | - | `migration_dry_run` is the load-bearing gate. It needs a real backup source, not a fixture. | |
| 151 | - | ||
| 152 | - | - [x] ~~Confirm astra's offsite replica writes a deterministic latest-link path.~~ Pivoted: pull direct from prod (`backup-puller@alpha-west-1:2200`, rrsync-locked to `/opt/makenotwork/backups/`). Astra offsite is separately broken — see carryover below. | |
| 153 | - | - [x] Wire the production `sando.toml` `backup.source` — `ssh://backup-puller@alpha-west-1:2200/latest.sql.gz` with `latest.sql.gz` as a hard link on prod. | |
| 154 | - | - [x] Schedule a daily `POST /backup/fetch` (systemd timer on fw13). `sandod-backup-fetch.{service,timer}` in `MNW/sando/deploy/`. Runs daily at 04:00 UTC (one hour after prod's 03:00 UTC backup-db.sh). Service uses `EnvironmentFile=/etc/sando/sando.env` for `$SANDO_DAEMON`. Verified 2026-05-31: one-shot test pulled 36MB backup, recorded in `backups` table. | |
| 155 | - | - [x] First end-to-end `migration_dry_run` against a real prod backup. Passed 2026-05-31 for sha 4541ebc in 1.2s: restored 36MB dump + applied all 133 migrations cleanly. Sha eee96a7 correctly failed `migration_dry_run` because it lacked migrations 123-132 that prod has applied — exactly the prod-vs-repo drift the gate is designed to catch. | |
| 156 | - | - [x] Document the failure modes: `plans/migration-dryrun-failures.md`. Covers all 7 fail modes (no backup, scratch_url unset, scratch reset, restore, drift, checksum mismatch, content broken against prod data) with operator playbook. | |
| 157 | - | - [x] Decide retention on `backups` table. 30 days; pruned at end of `backup::fetch`. `DELETE FROM backups WHERE fetched_at < datetime('now', '-30 days')`. | |
| 158 | - | ||
| 159 | - | ### Phase 2 carryovers / adjacent fires | |
| 160 | - | ||
| 161 | - | - [ ] **Offsite backup sync from prod → astra still broken.** Diagnosed 2026-05-31: `sync-backup-offsite.sh` was never deployed to prod (`deploy.sh` gap when it was added). `makenotwork@prod` had no SSH key. Generated key + installed pubkey on `max@astra:~/.ssh/authorized_keys`, created `/opt/backups/mnw` on astra. **Blocked** on Tailscale ACL: astra runs only Tailscale SSH (no regular sshd on a bypass port), and the ACL denies `tag:tagged-devices` (alpha-west-1) → astra as user `max`. Needs ACL update in the Tailscale admin console, then deploy `sync-backup-offsite.sh` to `/opt/makenotwork/` and test. Makenotwork@prod pubkey: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILzyQQ7pmBIZat8fABlpG/opwh4w5GhLIfkX2qxKxuT0 makenotwork@alpha-west-1`. | |
| 162 | - | - [x] **Prod backup `latest.sql.gz` hard link.** `backup-db.sh` now maintains `latest.sql.gz` atomically (`ln -f $LATEST.new && mv -Tf .new latest.sql.gz`). Deployed 2026-05-31; manual run verified (nlinks=2). | |
| 163 | - | ||
| 164 | - | ## Phase 3 — Parity with current `deploy.sh` | |
| 165 | - | ||
| 166 | - | Decisions captured in `plans/config-artifacts.md`. Summary: Caddyfile / systemd unit / backup script / security configs all move to **one-time node-bootstrap**, not per-deploy. error-pages bake into binary (MNW PR) with sibling fallback. mnw-admin ships alongside server via `bin_names: Vec<String>`. Restart warning is Phase 5, prod-tier-only. Prod migrations: server self-applies on startup (`main.rs:73`), sando does not. | |
| 167 | - | ||
| 168 | - | - [x] **Caddyfile** — decided: bootstrap-only. Not per-deploy. (`plans/config-artifacts.md` §1.) | |
| 169 | - | - [x] **systemd unit** — decided: bootstrap-only. (§4.) | |
| 170 | - | - [x] **Backup script** — decided: bootstrap-only. (§6.) | |
| 171 | - | - [x] **Error pages** — short-term done: ship as release-dir sibling. `build_and_run_mm` `cp -a` from `worktree/server/deploy/error-pages/` into the staged release dir; deploy_node's rsync of the whole dir picks it up. Verified on testnot 2026-05-31. Long-term `include_dir!` bake-in still a separate MNW PR. | |
| 172 | - | - [x] **mnw-admin binary** — `cfg.bin_names: Vec<String>` (default `["server"]`, MNW uses `["makenotwork","mnw-admin"]`). `deploy_local` copies each from worktree's `target/release/<bin>`; `deploy_node` rsyncs the whole staged dir. `Config::primary_bin()` returns first entry for systemd reference. `versions.artifact_path` stores the primary; release dir is derived as `.parent()`. Verified on testnot 2026-05-31. | |
| 173 | - | - [x] **Security configs** — decided: bootstrap-only. (§5.) | |
| 174 | - | - [ ] **Restart warning** — Phase 5, prod-tier only via `tier.restart_warning_seconds` in `sando.toml`; needs `CLI_SERVICE_TOKEN` in `/etc/sando/sando.env`. (§7.) | |
| 175 | - | - [x] **Cross-compile from macOS** — decided: retire after one sprint of testnot parity verification. fw13 builds natively. (§8.) | |
| 176 | - | - [x] **Prod migrations** — decided: server self-applies on startup. Sando does NOT run them. `migration_dry_run` gate is the prod safety net. (§9.) | |
| 177 | - | - [x] **Node-bootstrap script** — `MNW/sando/deploy/bootstrap-node.sh`. Idempotent. Takes `SANDO_PUBKEY` (required), `BIN_NAME`, `SERVICE_NAME`, `SERVICE_USER`, `DEPLOY_ROOT` env. Installs base packages (rsync/ufw/fail2ban), optionally postgres/tailscale/caddy, creates deploy user + dirs + sudoers entry + systemd unit, sets up UFW. Deliberately does NOT touch Caddyfile content, certs, postgres role/db, or secrets — those are operator-decisions per-node. testnot was done by hand and matches roughly what the script produces. Test by re-running on the next node added (tier B Hetzner prod move or tier C). | |
| 178 | - | ||
| 179 | - | ## Phase 4 — Cutover | |
| 180 | - | ||
| 181 | - | Run Sando in parallel with `deploy.sh` until trust is built, then retire the old path. | |
| 182 | - | ||
| 183 | - | - [ ] First successful Sando-only deploy to **testnot.work** (tier A). Old `deploy.sh` still primary for prod. | |
| 184 | - | - [ ] One sprint (two months) of Sando-shadow runs: every `deploy.sh` deploy is also driven through Sando in dry-run mode (gates run, deploys go to a parallel `releases/` dir on prod but don't swap `current`). Compare outcomes. | |
| 185 | - | - [ ] First Sando-only deploy to **Hetzner prod** (tier B). `deploy.sh` retained but unused. | |
| 186 | - | - [ ] Move `server/deploy/deploy.sh` to `server/deploy/archive/deploy.sh.legacy` with a header explaining the cutover; do not delete (reference for the next year). | |
| 187 | - | - [ ] Decommission astra CI runner (`server/deploy/run-ci.sh`). Sando's `cargo_test` gate replaces it; if any astra-specific checks are still needed (e.g., `cargo audit`), add them as additional gate kinds in `daemon/src/gates.rs`. | |
| 188 | - | - [ ] Update `CLAUDE.md` and `_meta/docs/operations.md` to point at Sando, not `deploy.sh`. | |
| 189 | - | ||
| 190 | - | ## Phase 5 — Operator UX | |
| 191 | - | ||
| 192 | - | The TUI polls. The MVP requires you to hand-insert a row for `manual_confirm`. Both are fine for one operator but rough. | |
| 193 | - | ||
| 194 | - | - [x] Build mutex: single-slot `AppState.active_build: Mutex<Option<AbortHandle>>`; newer `/rebuild` aborts any in-flight build. Cargo commands set `.kill_on_drop(true)` so abort propagates SIGKILL to cargo + rustc children. (Landed 2026-05-31 after observing two concurrent builds racing the scratch DB.) | |
| 195 | - | - [x] Implement `WS /events`: tail of gate starts/finishes, deploy events, build logs. Event enum in `daemon/src/events.rs`; `broadcast::channel(256)` in `AppState`; emit sites in build.rs, gates.rs, routes.rs (rebuild, promote, rollback, confirm, backup_fetch). Verified 2026-05-31: live JSON envelopes stream to a python `websockets` client. | |
| 196 | - | - [x] TUI: actions pane. `↑↓`/`jk` select tier; `p` promote (no body — defaults version); `R` rollback; `b` backup fetch; `c` manual_confirm. Action results land in the events log. Daemon URL via `$SANDO_DAEMON`. Built in `tui/src/main.rs` 2026-05-31. | |
| 197 | - | - [x] `POST /confirm/{tier}` endpoint — inserts `gate_runs` row with `passed=1, gate_kind='manual_confirm'` for the tier's `current_version`. Replaces hand-SQL workaround. Verified 2026-05-31 against tier `a`. | |
| 198 | - | - [x] TUI live log pane that follows the most recent build / gate run; backed by `WS /events`. 200-event ring buffer, human-formatted per kind. WS auto-reconnects every 3s. Header shows ws connection state. | |
| 199 | - | - [x] `POST /promote` body — `version` now optional; defaults to predecessor tier's `current_version`. (Unblocks the "promote what just baked" flow.) | |
| 200 | - | ||
| 201 | - | ## Phase 6 — Monitoring + alerting | |
| 202 | - | ||
| 203 | - | - [ ] Wire fw13 `/metrics` endpoint into the existing MNW Prometheus scrape config; record where the scrape config lives in `_meta/` or wherever monitoring already runs. | |
| 204 | - | - [ ] Add counters: `sando_builds_total{outcome}`, `sando_gates_total{tier,kind,outcome}`, `sando_deploys_total{tier,outcome}`, `sando_burn_in_remaining_hours{tier}`. | |
| 205 | - | - [ ] Alert: build failed. Page on first failure (not flap-protected — builds are infrequent). | |
| 206 | - | - [ ] Alert: migration_dry_run failed. Page immediately. This is the 2026-05-22-class signal. | |
| 207 | - | - [ ] Alert: a tier has had `current_version` unchanged for > N days while host is green. (Operator forgot to promote.) | |
| 208 | - | ||
| 209 | - | ## Phase 7 — Multi-node B+C | |
| 210 | - | ||
| 211 | - | Today B is the only prod node. Adding C is the second prod node + CF Load Balancing. | |
| 212 | - | ||
| 213 | - | - [ ] Provision tier C node (Hetzner or alternate provider — capture rationale). | |
| 214 | - | - [ ] Update `sando.toml`: set `c.provisioned = true`, add `[[tier.node]]`. | |
| 215 | - | - [ ] Set up Cloudflare Load Balancing with B + C as origin pool, health-checked. | |
| 216 | - | - [ ] Verify sequential canary in Sando: deploy to B, wait for CF health-check to mark healthy (probably 30-60s probe interval), then deploy to C. Add a `node.health_url` field and a gate-style wait between nodes. | |
| 217 | - | - [ ] Document in README that `canary = "parallel"` exists but should never be used for B+C unless you understand the failure modes. | |
| 218 | - | ||
| 219 | - | ## Phase 8 — Postgres-on-D | |
| 220 | - | ||
| 221 | - | Move Postgres off the prod app node so B+C become truly interchangeable. | |
| 222 | - | ||
| 223 | - | - [ ] Provision Postgres-only machine D (modest spec; reliability over performance). | |
| 224 | - | - [ ] Migrate the prod DB from Hetzner app node to D. Capture procedure in `plans/postgres-d-migration.md`. | |
| 225 | - | - [ ] Update `server` `DATABASE_URL` everywhere (env files on B+C, scratch URL on fw13 stays local). | |
| 226 | - | - [ ] Replica/HA story stays deferred; D is SPOF for now (per `_meta/preclear/.../decisions.md`). | |
| 227 | - | ||
| 228 | - | ## Phase 9 — Hardening | |
| 229 | - | ||
| 230 | - | Pick up after cutover is stable. | |
| 231 | - | ||
| 232 | - | - [ ] Tailnet ACL audit: confirm only the laptop can reach `sandod:7766`. Document the ACL. | |
| 233 | - | - [ ] Decide if v0.2 needs token auth on `sandod` endpoints (revisit assumption from `decisions.md` once there's a real second operator). | |
| 234 | - | - [ ] Sando self-deploy: Sando builds and deploys *itself* through its own pipeline. Bootstraps the bootstrap. Closes the chicken-and-egg loop and is satisfying. | |
| 235 | - | - [ ] Backup-of-Sando-state: nightly SQLite snapshot to astra. The state DB tracks 6 months of deploys; losing it on a fw13 disk failure would be annoying. | |
| 236 | - | ||
| 237 | - | ## Notes / non-checkbox | |
| 238 | - | ||
| 239 | - | - WS `/events` and the operator-UX work in Phase 5 can run in parallel with Phase 1-3 once Phase 0 is done. They are sequenced after for review clarity, not because they block anything. | |
| 240 | - | - "Hotfix override" and `reset_burn_in` flag are already implemented end-to-end (see `decisions.md`); not on this list because there's nothing left to do until prod uses them. | |
| 241 | - | - C tier exists in the schema as a `provisioned=false` row from day one — adding C in Phase 7 is a TOML edit, not a migration. | |
| 242 | - | - MakeMachine + EveryCycle are a separate project. The hardware BOM moved to `~/Code/everycycle/docs/hardware/mm-v1-bom.md` on 2026-06-01. |
| @@ -1,1192 +0,0 @@ | |||
| 1 | - | # Ultra Fuzz Report — MNW Server (Run #9 — launch eve) | |
| 2 | - | ||
| 3 | - | **Run date:** 2026-05-31 (evening) | |
| 4 | - | **Run number:** 9 (launchplan_final.md §1.5 referred to it as "Run #5" — stale; this is the 9th) | |
| 5 | - | **Trigger:** launchplan §1.5 pre-launch pass | |
| 6 | - | ||
| 7 | - | ## Run #9 headline | |
| 8 | - | ||
| 9 | - | Run #8 closed with "BAR MET — ALL FIVE AXES A-". Run #9 went deeper and surfaced 1 CRITICAL + 4 SERIOUS + several MED/HIGH items the prior 8 runs missed. All four launch-critical items fixed in-session; remaining items deferred with rationale below. | |
| 10 | - | ||
| 11 | - | | Axis | Run #8 | Run #9 | Direction | | |
| 12 | - | |------|--------|--------|-----------| | |
| 13 | - | | Payments | A- | A- | flat — 2 new SERIOUS surfaced; 1 fixed (webhook unmark on dual-failure 503), 1 deferred (subscription out-of-order webhook) | | |
| 14 | - | | Storage | A- | A- | flat — 1 new HIGH (migration 129 dead-letter table unused) + 2 MEDs (is_s3_key_live unindexed full-scan, LIKE-suffix false-positive); deferred | | |
| 15 | - | | UX Wiring | A- → B- → A- | A- | dipped on grade-cap for signup TOCTOU CRITICAL, restored after fix | | |
| 16 | - | | Security | A- | A- | flat — 2 new SERIOUS, both fixed (JWT-bump non-atomic, 2FA email IP spoofable) | | |
| 17 | - | | Performance | A- | A- | flat — 2 new HIGH (per-request reqwest::Client::new in 5 hot paths, unbounded spawn in expired-account cleanup); deferred to post-launch | | |
| 18 | - | ||
| 19 | - | **Net Run #9 (post-fix):** 0 CRITICAL · 1 SERIOUS open (Payments subscription ordering — documented deferral) · 3 HIGH open (deferred) · 7 MED open (deferred). **Launchplan §1.5 A- bar holds.** | |
| 20 | - | ||
| 21 | - | ## Run #9 — CRITICAL fixed in-session | |
| 22 | - | ||
| 23 | - | ### UX-CRITICAL — Signup TOCTOU: race → 500 + form loss → FIXED 2026-05-31 | |
| 24 | - | ||
| 25 | - | `src/routes/pages/public/join_wizard.rs:99-139`. The wizard ran separate `get_user_by_username` / `get_user_by_email` checks before `create_user`. A concurrent signup with the same username or email slipping between SELECT and INSERT raised a 23505 unique violation that bubbled to `AppError::Database` → 500 "Something went wrong" — and the user's entire typed-in form was lost. On a public alpha-launch surge this is the highest-traffic public endpoint; the wrong page to be returning 500s on. | |
| 26 | - | ||
| 27 | - | **Fix landed:** `create_user` call site now matches `AppError::Database(sqlx::Error::Database(_))` with code 23505, inspects the constraint name (`users_username_key` / `users_email_key`), and routes through `return_error(..)` with a friendly message — same flow as the explicit pre-check branches. Same shape as the existing 23505 handling in `db/license_keys.rs`, `db/builds.rs`, `routes/api/guest_checkout.rs`. | |
| 28 | - | ||
| 29 | - | **Known follow-up (not blocking):** the form-reload still loses typed values on the error swap; `return_error` renders `LoginErrorTemplate` (message-only). Preserving field values would require threading them through the template — file a separate Phase 4 polish item. | |
| 30 | - | ||
| 31 | - | ## Run #9 — SERIOUS fixed in-session | |
| 32 | - | ||
| 33 | - | ### Sec-SERIOUS — `delete_all_sessions_for_user` non-atomic JWT bump → FIXED 2026-05-31 | |
| 34 | - | ||
| 35 | - | `src/db/sessions.rs:247-263`. The function ran `DELETE FROM user_sessions` then a separate `UPDATE users SET jwt_invalidated_at = NOW()` on independent connections. If the UPDATE dropped (pool timeout, conn drop, postgres restart), session cookies were dead but every outstanding SyncKit JWT survived until natural expiry — exactly the leak this function exists to prevent. The in-code comment ("a session row deleted without a JWT bump is harmless, the converse would leak access") inverted reality. | |
| 36 | - | ||
| 37 | - | **Fix landed:** both writes wrapped in `pool.begin()` / `tx.commit()`. Comment updated. | |
| 38 | - | ||
| 39 | - | ### Sec-SERIOUS — 2FA login-notification email uses spoofable IP → FIXED 2026-05-31 | |
| 40 | - | ||
| 41 | - | `src/routes/pages/public/two_factor.rs:308-312`. The 2FA-completion path read `x-forwarded-for` raw (first-comma-split) for the new-login email's IP field. Every other login surface (`routes/auth.rs:242`, `auth.rs:486`, `auth.rs:528`) routes through `crate::helpers::extract_client_ip` which prioritizes `CF-Connecting-IP`. An attacker who already captured a password could pre-set `X-Forwarded-For: 1.2.3.4` on the verify-2fa POST so the "new login from <city>" email lied about origin — the exact email users are told to trust for compromise detection. | |
| 42 | - | ||
| 43 | - | **Fix landed:** swapped to `crate::helpers::extract_client_ip(&headers)`. One-line change, parity restored. | |
| 44 | - | ||
| 45 | - | ### Pay-SERIOUS — Webhook dual-failure dropped events silently → FIXED 2026-05-31 | |
| 46 | - | ||
| 47 | - | `src/routes/stripe/webhook/mod.rs:73-89`. Dedup row was marked processed before handler dispatch (correct for at-least-once). On `(handler_err, insert_failed_event_err)` dual failure, code returned 503 to trigger Stripe redelivery — but Stripe's redelivery would short-circuit at the dedup check (line 50) and 200 the event without ever processing it. The code's own comment acknowledged the bug; the right tool (`unmark_event_processed`, defined 30 lines away in `db/webhook_events.rs:40`) was never called. | |
| 48 | - | ||
| 49 | - | **Fix landed:** call `db::webhook_events::unmark_event_processed(&state.db, &event_id)` before returning 503, with logged-error best-effort if even that fails (same scenario where 503 was already wrong). | |
| 50 | - | ||
| 51 | - | ## Run #9 — DEFERRED with rationale (above A- bar) | |
| 52 | - | ||
| 53 | - | ### Pay-SERIOUS — Subscription webhook out-of-order events resurrect `active` | |
| 54 | - | ||
| 55 | - | `src/routes/stripe/webhook/subscriptions.rs:90, 116, 140`. Handlers blindly overwrite `subscriptions.status` and `period_end` from the webhook payload. Stripe does NOT guarantee delivery order. Sequence `past_due → active` reordered as `active → past_due → active(stale)` overwrites a legitimate `past_due` with stale `active` — restoring access for a user who hasn't paid. | |
| 56 | - | ||
| 57 | - | **Deferral rationale:** worst case is restored access for a few minutes until the next webhook arrives. Fix requires re-extracting Stripe's top-level `created` from `UntypedEvent` (currently dropped) and adding `WHERE last_event_at IS NULL OR last_event_at <= $created` guards on every status/period write across Fan+, creator-tier, and synckit code paths — non-trivial cross-cutting change. Post-launch fix in Phase 4; tracked in todo.md. | |
| 58 | - | ||
| 59 | - | ### Sto-HIGH — Migration 129 dead-letter table never written | |
| 60 | - | ||
| 61 | - | `migrations/129_pending_s3_deletions_dead_letter.sql` creates `pending_s3_deletions_dead_letter` and documents it as "operator-visible parking lot... require manual triage." `src/scheduler/cleanup.rs:453-457` on `attempts >= 10` only logs `tracing::error!` then removes the row — never inserts into the dead-letter table. Permanently-failing keys have zero operator visibility. | |
| 62 | - | ||
| 63 | - | **Deferral rationale:** operational, not runtime. No user impact; only operators lose triage signal. One-INSERT fix; bundle into Phase 4. | |
| 64 | - | ||
| 65 | - | ### Perf-HIGH — Per-request `reqwest::Client::new()` in 5 hot paths | |
| 66 | - | ||
| 67 | - | `routes/pages/dashboard/main.rs:118`, `routes/pages/public/landing.rs:284`, `routes/api/internal/cli_features.rs:440`, `routes/api/domains.rs:319`, `auth.rs:559`. Each call builds a fresh TCP pool, TLS context, DNS resolver — no keep-alive across requests. `MtClient` in `AppState` already keeps a pooled client; the dashboard bypasses it. | |
| 68 | - | ||
| 69 | - | **Deferral rationale:** real but matters at scale. Private alpha launch traffic well below where this becomes a tail-latency contributor. 30-min refactor; bundle into Phase 4 once launch traffic settles. | |
| 70 | - | ||
| 71 | - | ### Perf-HIGH — Unbounded `tokio::spawn` in expired-account cleanup | |
| 72 | - | ||
| 73 | - | `src/scheduler/cleanup.rs:215-220` (`spawn_expired_account_cleanups`). Daily tick spawns one task per expired account, no governor. `cleanup_sandbox_accounts` (same file, ~100 lines above) correctly caps at `CLEANUP_PARALLELISM=4` via `JoinSet`; the terminated/content-removal variants don't. A backlog of 200 expired accounts fan-outs 200 concurrent S3 prefix listings racing for the 25-conn pool at midnight. | |
| 74 | - | ||
| 75 | - | **Deferral rationale:** runs once daily; current expired-account count is small (private alpha). Trivial fix (lift the existing JoinSet pattern); not launch-blocking. Bundle with Phase 4. | |
| 76 | - | ||
| 77 | - | ## Run #9 — MED/LOW deferred (read-only carry-forward, in todo.md) | |
| 78 | - | ||
| 79 | - | - Pay-MED: `pricing.rs::parse_dollars_to_cents` misinterprets European decimal comma (`1,23` → 12300¢). User-controlled input; fixable in a single regex. | |
| 80 | - | - Pay-MED: SyncKit app-sub checkout silently defaults `storage_limit_bytes` to 0 if metadata missing. | |
| 81 | - | - Pay-MED: Guest checkout email falls back to `"unknown@guest"` sentinel; collisions possible. | |
| 82 | - | - Sto-MED: `is_s3_key_live` runs 7 EXISTS subqueries on unindexed `items.audio_s3_key` / `cover_s3_key` / `video_s3_key` / `versions.s3_key` etc — sequential scans per retry. | |
| 83 | - | - Sto-MED: `is_s3_key_live` LIKE-suffix pattern `'%' || s3_key` false-positives on neighboring keys (key `abc/file.png` matches `xabc/file.png`) — skips a legitimate delete → S3 object leaks. | |
| 84 | - | - UX-MED: "Log in" return_to query param in `purchase.html:145` is dead-wired — login handler always redirects `/dashboard`. Lost purchase intent. | |
| 85 | - | - UX-MED: Admin user filter buttons (`admin-users.html:35-44`) use `class="primary"` / `class="secondary"` instead of `btn-primary` / `btn-secondary` — renders unstyled. | |
| 86 | - | - UX-LOW: Pagination links in `git/issues.html:72,76` don't URL-encode `search`; `&page=99` in search query corrupts pagination. | |
| 87 | - | - UX-LOW: 5 sites do `.render().unwrap_or_default()` on Askama templates (blank UI on render failure, no log). | |
| 88 | - | - UX-LOW: `slugify` in `formatting.rs` produces `"post"` for any non-ASCII title; international creators get opaque URLs. | |
| 89 | - | - Sec-MINOR: `csrf.rs:176-185` `validate_token_consuming` doesn't consume — name promises stronger property than implementation. | |
| 90 | - | - Sec-MINOR: `routes/oauth.rs:101-111` `is_localhost_redirect` allows any port on localhost regardless of registered URI. | |
| 91 | - | - Sec-MINOR: `routes/pages/public/two_factor.rs::pending_2fa_started_at` reads `i64` via session.get; type mismatch silently → None → instantly-expired. | |
| 92 | - | - Sec-MINOR: `scanning/archive.rs:124` path-traversal check misses lone `..` segment (no trailing separator). | |
| 93 | - | - Perf-LOW: `scheduler/announcements.rs` linear walk through subscriber list in a single spawned task; no checkpointing. | |
| 94 | - | - Perf-LOW: `db/page_views.rs` `pending` HashMap has no max-cardinality cap (crawler hitting 100k unique target_ids before tick). | |
| 95 | - | - Perf-LOW: `build_runner.rs:441` local artifact tmpfile leaks if process crashes between SCP and `remove_file`. | |
| 96 | - | ||
| 97 | - | ## Run #9 — mandatory surprises | |
| 98 | - | ||
| 99 | - | - **Payments:** `routes/stripe/webhook/mod.rs:82-89` literally documents the bug it ships ("the dedup row was already marked processed... Stripe won't retry") and then chooses 503 anyway. The fix (`unmark_event_processed`) sat 30 lines away in the same crate, never called. Scar-tissue-comment-without-the-fix is a recognizable pattern across the codebase. | |
| 100 | - | - **Storage:** `routes/storage/mod.rs::commit_upload` sealed-helper pattern (Run #7 fix for the chronic disease) is the strongest piece of structural engineering in the repo — turned an enum into a witness type. But the *neighbor* file `migrations/129_pending_s3_deletions_dead_letter.sql` shows the opposite: migration written with detailed prose explaining the operator's parking lot, and the actual INSERT never wired up. Two adjacent fixes from the same audit-cycle, one structural and load-bearing, one ceremonial and silently broken. | |
| 101 | - | - **UX:** `csrf.rs` `PostureMethodRouter` + sealed `CsrfManuallyValidated` witness make registering a mutation route without an explicit posture declaration *uncompilable*. A+ engineering. The contrast with the signup wizard's TOCTOU-and-500-with-lost-form is jarring — defensive depth on CSRF, none on the front door. | |
| 102 | - | - **Security:** `routes/auth.rs:128-130` malformed-email branch skips the DUMMY_HASH timing equalizer that was added explicitly to prevent timing-side-channel user enumeration. ~2 orders of magnitude faster than every other failure path. The equalizer exists; this one path bypasses it. | |
| 103 | - | - **Performance:** `db/projects.rs::get_project_ids_for_user` is the only `fetch_all` in `projects.rs` without a `LIMIT`. Its neighbor `get_projects_by_user` caps at 500 with a documented safety comment. Cyber-squatter with 10k projects + account expiry → 10k S3 prefix-deletes in one spawned task. Asymmetric defense within the same module. | |
| 104 | - | ||
| 105 | - | ## Run #9 — stress-tested OK | |
| 106 | - | ||
| 107 | - | Verified attacks the code survived (high-confidence positives): | |
| 108 | - | ||
| 109 | - | - Stripe webhook signature replay (HMAC constant-time, multi-secret rotation, timestamp tolerance both directions) | |
| 110 | - | - Promo code concurrent over-use (single atomic UPDATE with max_uses + expires_at + starts_at) | |
| 111 | - | - Cart race past pre-check (23505 fallback aborts cleanly without charging) | |
| 112 | - | - License key prediction (6 wordlist × CSPRNG ≈ 66 bits) | |
| 113 | - | - Pre-signed URL Content-Length binding (S3 rejects mismatch at protocol level) | |
| 114 | - | - Storage cap atomicity (`try_replace_storage` single UPDATE) | |
| 115 | - | - Build claim race (partial unique index + 23505 backstop) | |
| 116 | - | - Idempotent re-confirms in all 4 upload confirm handlers (reaper-deletes-live-object closed) | |
| 117 | - | - Session row + JWT atomicity (post-fix verified above) | |
| 118 | - | - TOTP replay across skew window (matched-step tracked + strict `>` gate) | |
| 119 | - | - OAuth PKCE downgrade (S256 pinned at authorize + token-exchange) | |
| 120 | - | - CSRF body bypass via textarea-smuggled token (proper form parser) | |
| 121 | - | - Git diff/blame XSS (HTML-escaped in attacker-controlled spots) | |
| 122 | - | - Internal error leakage (tests assert no PG host, no S3 bucket, no sqlx variant leaks) | |
| 123 | - | ||
| 124 | - | ## Run #9 confidence per axis | |
| 125 | - | ||
| 126 | - | - Payments **HIGH** (~70% LoC read this pass; Phase 4 backlog visible) | |
| 127 | - | - Storage **HIGH** (full module read; cleanup.rs upper half only — MEDIUM there) | |
| 128 | - | - UX Wiring **HIGH** for CSRF/error/validation; **MEDIUM** for wizard step partials, embed routes, dashboard CSV import | |
| 129 | - | - Security **HIGH** for auth/CSRF/session; **MEDIUM** for scanning (YARA rule content unread), API key scoping | |
| 130 | - | - Performance **HIGH** for scan worker, scheduler, storage, build_runner; **MEDIUM** for SyncKit, postmark, import pipeline | |
| 131 | - | ||
| 132 | - | ## Run #9 bug counts | |
| 133 | - | ||
| 134 | - | | Severity | Payments | Storage | UX | Security | Perf | Total | | |
| 135 | - | |---|---|---|---|---|---|---| | |
| 136 | - | | CRITICAL | — | — | 1 (FIXED) | — | — | **1** | | |
| 137 | - | | SERIOUS | 2 (1 FIXED, 1 deferred) | — | — | 2 (FIXED) | — | **4** | | |
| 138 | - | | HIGH | — | 1 (deferred) | — | — | 2 (deferred) | **3** | | |
| 139 | - | | MED | 3 (deferred) | 2 (deferred) | 2 (deferred) | — | — | **7** | | |
| 140 | - | | LOW/NOTE | 2 | — | 3 | 4 | 3 | 12 | | |
| 141 | - | ||
| 142 | - | ## Run #9 delta vs Run #8 | |
| 143 | - | ||
| 144 | - | - 1 CRITICAL surfaced + fixed (signup TOCTOU); class missed by prior 8 runs because no agent explicitly probed the public-signup race window | |
| 145 | - | - 4 SERIOUS surfaced; 3 fixed in-session, 1 deferred with rationale | |
| 146 | - | - Run #8 "BAR MET" claim was correct *for the surfaces it audited* but understated: this pass added explicit attack-vector probing for cross-conn atomicity, IP spoof parity across auth surfaces, and webhook dedup edge paths — none of which were in prior runs' scope | |
| 147 | - | - All previously closed Run #8 fixes verified intact (commit_upload seal, S1 tx atomicity, background.rs queue, cart MEDs) | |
| 148 | - | ||
| 149 | - | --- | |
| 150 | - | ||
| 151 | - | # Ultra Fuzz Report — MNW Server (Run #8 — historical) | |
| 152 | - | ||
| 153 | - | **Run date:** 2026-05-31 | |
| 154 | - | **Run number:** 8 | |
| 155 | - | ||
| 156 | - | ## Run #8 Headline | |
| 157 | - | ||
| 158 | - | | Axis | Run #5 | Run #6 | Run #7 | Run #8 | Direction | | |
| 159 | - | |------|--------|--------|--------|--------|-----------| | |
| 160 | - | | Payments | B | B+ | A- | **A-** | flat — H2 still deferred; 2 new MEDs surfaced (cart `min_price_cents` bypass, cart-all chain-break on all-free first seller) | | |
| 161 | - | | Storage | B- | A- | B+ | **A-** | ↑ H1 + S1 fixes verified closed; commit_upload seal intact across all 7 confirm handlers; genericization clean at every caller including synckit/blobs.rs | | |
| 162 | - | | UX Wiring | B | A- | A- | **A-** | flat — 1 new MED (item-wizard `pricing_model` silent fallback to "free" — same disease class fixed in project wizard at Run #6, not propagated) | | |
| 163 | - | | Security | A- | A- | A- | **A-** | flat — only diff in scope (username availability fail-closed) is a net improvement; MED backlog identical to Run #5/#6/#7 | | |
| 164 | - | | Performance | B- | A- | A- | **A-** | flat with 1 new SERIOUS — webhook `checkout_helpers.rs` unbounded `tokio::spawn` (send_purchase_emails / mailing_list / tip_email) competes with request handlers for the 25-slot pool under burst | | |
| 165 | - | ||
| 166 | - | **Net Run #8:** 0 CRITICAL · 1 SERIOUS new (Perf webhook spawn) — FIXED 2026-05-31 · 5 new MED — ALL FIXED 2026-05-31 · 1 SERIOUS previously-deferred (Payments H2 `claim_free_project` soft race) — FIXED 2026-05-31. | |
| 167 | - | ||
| 168 | - | **Post-Run #8 status (2026-05-31 end-of-day): 0 CRITICAL · 0 SERIOUS · 0 MED open from any prior run.** All five axes A-, all above-MED items closed, all Run #8 MEDs closed, prior-deferred SERIOUS closed. Launchplan §1.5 bar fully cleared. | |
| 169 | - | ||
| 170 | - | **2026-05-31 post-Run-#8 backlog sweep (7 waves):** 24 of 26 carried MED/LOW/NOTE items closed across Storage (5), Security (8), Performance (3), UX (2), Payments (2), Auth (4). Two deferred with rationale: `build_runner.rs` serial targets (LOW, builds run rarely, refactor touches denominator) and `scheduler/mod.rs` advisory-lock granularity (multi-replica concern, single-process today). New schema migration `133_items_duration_seconds_nonnegative.sql` pins the negative-duration invariant in the DB. New `commit_rescan` helper extends the chronic-disease commit_upload seal to admin paths. Tests: 1655 / 0. | |
| 171 | - | ||
| 172 | - | **Launchplan §1.5 bar:** **ALL 5 AXES AT A- — BAR MET.** The new Perf SERIOUS is axis-internal and the agent kept Perf at A- (machinery wins outweigh; same shape as previously-closed `record_view` per-request spawn — apply mpsc + drainer pattern). New Payments MEDs and UX MED are launch-quality items worth addressing or documenting before ship; none are A- blockers. | |
| 173 | - | ||
| 174 | - | ## Run #8 — new findings above MED | |
| 175 | - | ||
| 176 | - | ### P-SERIOUS — Webhook hot-path unbounded `tokio::spawn` (Performance) — FIXED 2026-05-31 | |
| 177 | - | `src/routes/stripe/webhook/checkout_helpers.rs:58, 96, 124, 290` + `src/routes/stripe/webhook/checkout.rs:618`. `send_purchase_emails`, `subscribe_buyer_to_mailing_list`, `send_tip_email`, `send_guest_sale_notification`, guest-purchase-confirmation each `tokio::spawn` from the webhook handler. Multi-item cart fires N spawns per webhook; each task acquires 1-2 pool conns + a Postmark call. No JoinSet, no cap. Under burst, hundreds of detached tasks competed with request handlers for the 25-slot pool. Same shape as the Run #4 `record_view` per-request spawn (fixed via mpsc + drainer). | |
| 178 | - | ||
| 179 | - | **Fix landed:** new generic `src/background.rs` module — `BackgroundTx` + `spawn_pool()` with bounded mpsc (capacity 1024) + semaphore-bounded concurrent execution (8 workers, well below `DB_POOL_MAX_CONNECTIONS=25`). `state.bg.spawn(name, fut)` is non-blocking; queue overflow logs a warning and drops the task. The `spawn_email!` macro was refactored to use the bg queue (covers 17 callers across auth/admin/follows/library/two_factor/stripe webhook/login flows). The 5 manual webhook `tokio::spawn` sites were also migrated. Per-request email sends from postmark issue replies (×2), guest-claim email, and join-wizard signup (×2) were migrated in the same pass — same disease, same fix. | |
| 180 | - | ||
| 181 | - | **Out of scope for this fix** (different bug shapes; defer to Phase 4 polish or own remediation): import pipeline (long-running, needs own bound), MT community creation (single outbound HTTP, minor pool pressure), creator departure notification + status broadcast (broadcast-class — use `broadcast.rs` JoinSet pattern), idempotency-store post-response (trivial DB write), build_runner (already gated by claim flow), scheduler/monitor/scanning/page_views (background workers, not per-request). | |
| 182 | - | ||
| 183 | - | ### Payments MED — Cart `min_price_cents` bypass — FIXED 2026-05-31 | |
| 184 | - | Both cart paths (`process_seller_checkout` and `create_cart_checkout`) now check `pc.min_price_cents` for non-platform Discount codes before applying the discount. Cart skips the ineligible item (others may still qualify) rather than rejecting the whole cart — matches the existing scope-skip pattern. | |
| 185 | - | ||
| 186 | - | ### Payments MED — Cart-all chain-break on all-free first seller — FIXED 2026-05-31 | |
| 187 | - | `process_seller_checkout` signature changed `Result<String>` → `Result<Option<String>>`; all-free path now returns `Ok(None)` instead of `Err(BadRequest)`. New `drain_to_paid` helper loops through the queued sellers until a paid one is reached (returns URL) or queue exhausted (returns `Ok(None)` → library redirect). Both callers (`create_cart_checkout_all` and `checkout_success`) updated to use it. | |
| 188 | - | ||
| 189 | - | ### UX MED — Item wizard `pricing_model` silent fallback — FIXED 2026-05-31 | |
| 190 | - | `save_pricing` now rejects missing pricing_model with `AppError::validation("Select a pricing model")` and rejects unknown values with `format!("Unknown pricing model: {other}")`. Same shape as the project wizard Run #6 fix. | |
| 191 | - | ||
| 192 | - | ### UX MED — Inline-JS template duplication — FIXED 2026-05-31 | |
| 193 | - | Added delegated `data-copy-link` click handler to `static/mnw.js` with proper `.catch()` (falls back to `window.prompt` in non-secure contexts — better than the silent-no-op the inline snippets shipped with). 8 templates migrated from `onclick="navigator.clipboard.writeText(...).then(...)"` to `<a href="..." data-copy-link>Copy link</a>` (audio_player, blog_post, collection, item, project, text_reader, user, video_player). `href` is the real URL so middle-click / no-JS / share menus still work. Cache-bust query bumped to `v=0531`. | |
| 194 | - | ||
| 195 | - | ### Perf MED — Cart free-claim N+1 — FIXED 2026-05-31 | |
| 196 | - | Extended `CartItem` with `enable_license_keys` + `default_max_activations` (both cart queries pull them through). Three free-claim loops (single-seller paid path, discount-zeroed promo path, chain-flow path) drop the per-item `get_item_by_id` and replace per-item `remove_from_cart` DELETE with a single bulk `remove_from_cart_bulk(..., ANY($2))` at the end of each loop. Per-item tx for `claim_free_item` stays (the per-item claim-vs-already-purchased return value is load-bearing for sales-count increment). Roundtrips per free item dropped from ~5-7 to ~3-4; per-loop DELETEs from N to 1. | |
| 197 | - | ||
| 198 | - | ## Run #8 — verified standing (storage fixes from session) | |
| 199 | - | ||
| 200 | - | - **H1** (`uploads.rs::confirm_upload` L295-337) — three-arm match correct. Zero-rows arm rolls back (replace path = `try_replace_storage` swap-back with `i64::MAX` cap; fresh-upload path = `decrement_storage_used`), then `enqueue_s3_orphan(new_key)`, returns BadRequest "Item was modified concurrently." Returns BEFORE `commit_upload` and BEFORE `remove_pending_upload` — pending_uploads row left as reaper second-line defense. | |
| 201 | - | - **S1** (`media.rs::media_confirm` L241-293) — single `state.db.begin()` wraps storage credit + pending_uploads clear + media_files INSERT. S3 IO entirely outside tx. tx drop → Postgres ROLLBACK → all three writes reverted atomically. 23505 detection via typed `AppError::Database(sqlx::Error::Database(...))` pattern works post-rollback. S3 cleanup fires on every tx-failure branch. | |
| 202 | - | - **Genericization** — `pending_uploads::remove_pending_upload` and `media_files::create` now `impl PgExecutor<'e>`. All 12 callers (including `synckit/blobs.rs:157`) still compile and execute correctly. | |
| 203 | - | - **Pool pressure delta from S1 tx** — neutral-to-better. Prior code grabbed 3 separate conns serially; new code grabs 1 conn for ~3× the duration. Users-row write lock held ~ms. Per-user serialization for sub-second uploads acceptable. | |
| 204 | - | ||
| 205 | - | ## Run #8 — mandatory surprises | |
| 206 | - | ||
| 207 | - | - **Payments:** `compute_splits` more careful than its comment promises — remainder-distribution loop constrained by `expected_total = amount * raw_total_pct.min(100) / 100`, so under-100% splits keep the owner's share AND distribute floor-rounding remainders up to bound. Proptest-style invariant tests fully fence it. | |
| 208 | - | - **Storage:** `try_increment_storage_on` inside the tx holds a row-level lock on `users` for the duration of the tx. Not a bug (sub-ms hold; cap can't be over-shot via WHERE re-evaluation under READ COMMITTED). But every media confirm now serializes per-user against every other storage write. | |
| 209 | - | - **UX:** Copy-link button is a chimera. Nine templates copy the same inline `onclick` that calls `navigator.clipboard.writeText`, mutates `this.textContent` to `"Copied!"` — silently broken in any tab loaded over plain HTTP, in iframes, or with restrictive CSP. No `.catch()` → no fallback, no error. | |
| 210 | - | - **Security:** `routes/auth.rs:128-130` malformed-email branch skips DUMMY_HASH timing equalizer. ~2 orders of magnitude faster than every other failure path — distinguishes "you submitted an invalid-email-shaped string" from "valid email, unknown account." Real timing oracle a few lines above the equalizer that was deliberately added to prevent exactly this. | |
| 211 | - | - **Performance:** `metrics::idempotency_middleware` does a DB SELECT on EVERY POST/PUT with an `Idempotency-Key` header BEFORE the handler runs. No bloom filter, no negative cache. ~1 extra ms per POST already doing 2-5 DB queries — free 20%+ on POST p50 available by adding an in-memory `seen` set. | |
| 212 | - | ||
| 213 | - | ## Run #8 bug counts | |
| 214 | - | ||
| 215 | - | | Severity | Payments | Storage | UX | Security | Perf | Total | | |
| 216 | - | |---|---|---|---|---|---|---| | |
| 217 | - | | CRITICAL | — | — | — | — | — | **0** | | |
| 218 | - | | SERIOUS | 1 (deferred) | — | — | — | 1 (new) | **2** | | |
| 219 | - | | MED | 2 (new) | 7 | 5 | 8 | 5 | 27 | | |
| 220 | - | | LOW/NOTE | 5 | 3 | 4 | 3 | 2 | 17 | | |
| 221 | - | ||
| 222 | - | ## Run #8 confidence per axis | |
| 223 | - | ||
| 224 | - | - Payments **HIGH** (~70% LoC read) | |
| 225 | - | - Storage **HIGH** (full) | |
| 226 | - | - UX **HIGH** | |
| 227 | - | - Security **HIGH** (scoped); MEDIUM for storage-route auth side-effects | |
| 228 | - | - Performance **HIGH** | |
| 229 | - | ||
| 230 | - | ## Run #8 delta vs Run #7 | |
| 231 | - | ||
| 232 | - | - **Storage B+ → A-.** H1 + S1 fixes verified closed. Genericization clean. | |
| 233 | - | - **Payments A- flat.** 2 new MEDs (cart `min_price_cents` bypass, cart-all chain-break) surfaced via expanded coverage; H2 deferred unchanged. | |
| 234 | - | - **UX A- flat.** 1 new MED (item-wizard `pricing_model` silent fallback) — same disease class as project wizard fix from Run #6, not propagated. | |
| 235 | - | - **Security A- flat.** Net improvement (username fail-closed). MED backlog identical. | |
| 236 | - | - **Performance A- flat.** 1 new SERIOUS (webhook unbounded spawn) — same shape as Run #4 `record_view` fix. Cart free-flow N+1 (MED) — Run #5 fix covered paid only. | |
| 237 | - | ||
| 238 | - | --- | |
| 239 | - | ||
| 240 | - | # Ultra Fuzz Report — MNW Server (Run #7 — historical) | |
| 241 | - | ||
| 242 | - | **Run date:** 2026-05-31 | |
| 243 | - | **Run number:** 7 (+ S1 + Storage code-fuzz fixes confirmed in Run #8) | |
| 244 | - | ||
| 245 | - | ## Headline | |
| 246 | - | ||
| 247 | - | | Axis | Run #5 | Run #6 | Run #7 | Direction | | |
| 248 | - | |------|--------|--------|--------|-----------| | |
| 249 | - | | Payments | B | B+ | **A-** | ↑↑ Phase 2 + Run #6 + Run #7 fixes all landed; S1 cart 23505 swallow fixed post-Run #7; H2 claim_free_project soft race deferred | | |
| 250 | - | | Storage | B- | A- | **B+ → A- pending Run #8** | ↑/↓ commit_upload structural fix is excellent; Run #6 idempotency fix introduced HIGH-1 (pending_uploads leak in 4 sites) + HIGH-2 (missing rollback on update_*_url) — both fixed post-Run #7. Storage code-fuzz 2026-05-31 surfaced H1 (confirm_upload silent zero-rows + side-effects-already-fired) and reopened S1 media_confirm tx atomicity — both fixed in same session | | |
| 251 | - | | UX Wiring | B | A- | **A-** | ↑ field-aware deletion + parse_dollars_to_cents shared; pricing_model silent fallback HIGH found and fixed post-Run #7 | | |
| 252 | - | | Security | A- | A- | (unchanged) | flat — no security-touching changes in Runs #6/#7 | | |
| 253 | - | | Performance | B- | A- | (unchanged) | flat — no perf-touching changes in Runs #6/#7 | | |
| 254 | - | ||
| 255 | - | ## Post-Run #7 Storage code-fuzz (2026-05-31) | |
| 256 | - | ||
| 257 | - | Targeted code-fuzz scoped to the Storage axis to verify A- before triggering full Run #8. Two findings above MED, both fixed in-session: | |
| 258 | - | ||
| 259 | - | - **H1 (HIGH) — `routes/storage/uploads.rs::confirm_upload` silent `rows_affected = 0`.** Same shape as the just-closed HIGH-2 (`update_*_url`), one step further along the same handler family. UPDATE at L295 uses ownership-filter `WHERE id = $1 AND project_id IN (SELECT id FROM projects WHERE user_id = $4)`; `rows_affected()` was never checked. If the item was deleted between `get_item_owner` (L156) and the UPDATE, storage credit stayed incremented, `pending_uploads` got cleared a few lines down, and `commit_upload` enqueued a scan job against a ghost target — permanent S3 leak + over-charged counter. **Fix:** three-arm match on the UPDATE result; zero-rows case rolls back storage and routes the new S3 key through `enqueue_s3_orphan` so the reaper still cleans it, then returns BadRequest "Item was modified concurrently." | |
| 260 | - | - **S1 (SERIOUS, Run #5 plan #12 reopened) — `routes/storage/media.rs::media_confirm` three-write atomicity.** Run #5 called for wrapping `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction; Run #7's in-process compensation only covered in-process errors. Process interruption (panic, OOM kill, container restart) between any two writes still leaked. **Fix:** all three writes now in a single tx; tx drop rolls back storage + pending_uploads + media_files atomically. Only the S3 object needs explicit cleanup (single `delete_object` after rollback). Supporting DB-layer changes: `creator_tiers::try_increment_storage_on(&mut PgConnection)` tx-friendly variant; `pending_uploads::remove_pending_upload` and `media_files::create` signatures genericized to `impl PgExecutor<'e>` (backwards compatible). | |
| 261 | - | ||
| 262 | - | Remaining storage MED/LOW (below launchplan §1.5 A- bar; ride into Phase 4 polish or document deferral): | |
| 263 | - | - MED — `update_project_image_url` / `update_item_cover` ignore `rows_affected()` (same shape as H1; mitigated for current callers because the only follow-on side-effect is `bump_cache_generation`). | |
| 264 | - | - MED — `downloads.rs:120` `((duration as u64) * 2).max(3600)` with no DB `CHECK (duration_seconds >= 0)`. Negative duration → multi-decade presigned URL. Exploitability requires creator-controlled negative duration; ffprobe doesn't produce them. Cap in code + add CHECK migration. | |
| 265 | - | - MED — Admin rescan paths (`routes/admin/uploads.rs:347, 390`) call `db::scan_jobs::enqueue` directly, bypassing the `commit_upload` structural seal. Ordering is correct so no live bug; demote `db::scan_jobs::enqueue` to `pub(crate)` and expose `commit_rescan(target, ...)` to close the chronic-disease finding for real. | |
| 266 | - | - MED — `enqueue_s3_orphan` single-policy doc in `routes/storage/mod.rs:24-30` overstates the discipline; many `s3.delete_object(...).await.ok()` direct calls remain at pre-storage-credit rejection paths. Tighten the doc or migrate the post-storage-credit sites. | |
| 267 | - | - MED — `is_s3_key_live` doesn't enumerate project image URLs (project cover keys live in a distinct prefix so no current bug; surface is fragile if future code paths queue project image keys). | |
| 268 | - | - LOW — `scanning/worker.rs:251` inline `UPDATE media_files SET scan_status` instead of `db::scanning::update_media_file_scan_status` helper. | |
| 269 | - | - LOW — `routes/pages/dashboard/wizards/item/save.rs:95` `update_item_cover_image_url` updates only `cover_image_url` (not s3_key/size); client-side hidden-field abuse can desync. | |
| 270 | - | - LOW — `db/pending_uploads.rs::remove_pending_upload` deletes by s3_key alone (per-handler prefix validation makes cross-user collision unreachable, but the function signature is broader than it needs to be). | |
| 271 | - | ||
| 272 | - | **Chronic disease status (5th run):** The invariant-in-prose / sibling-not-swept pattern that recurred across Runs #2–#6 was **structurally addressed** in Run #7 via two helpers: | |
| 273 | - | - `routes/storage/mod.rs::commit_upload(target: CommitTarget, ...)` — sealed `enqueue_scan_for` to module-private; the helper is now the only handler-reachable path for scan enqueue + scan_status flip after a DB write. Bug shapes 1–3 from prior runs are now structurally impossible to introduce in a new sibling. | |
| 274 | - | - `crate::pricing::parse_dollars_to_cents` + `validate_dollars_f64` — canonical dollar-to-cents conversion; bypassing has historically introduced NaN→$0 and saturating-overflow silent bugs. | |
| 275 | - | ||
| 276 | - | **Net after Run #7 + S1 fix:** 0 CRITICAL · 0 HIGH/SERIOUS · 1 SERIOUS deferred (Payments H2 soft race on `claim_free_project`) · a handful of MED/LOW polish items. | |
| 277 | - | ||
| 278 | - | --- | |
| 279 | - | ||
| 280 | - | # Ultra Fuzz Report — MNW Server (Run #5 — historical) | |
| 281 | - | ||
| 282 | - | **Run date:** 2026-05-30 | |
| 283 | - | **Run number:** 5 | |
| 284 | - | ||
| 285 | - | ## Headline | |
| 286 | - | ||
| 287 | - | | Axis | Run #4 | Run #5 | Direction | | |
| 288 | - | |------|--------|--------|-----------| | |
| 289 | - | | Payments | A- | **B** | ↓ (Run #4 plan items closed; 4 new SERIOUS surfaces previously unaudited: NULL item_id refund, splits >100% overflow, tip project authorization, cart unlisted bypass) | | |
| 290 | - | | Storage | A- | **B-** | ↓ (Run #4 `images.rs` ordering bug closed; same disease reappeared in `uploads.rs` route gate ordering — file-type rejection runs AFTER scan enqueue) | | |
| 291 | - | | UX Wiring | C+ | **B** | ↑ (Run #4 CSRF patchwork + creator-tier token fixed and structurally enforced; new CRIT: field-aware validation API is dead code at template boundary) | | |
| 292 | - | | Security | B+ | **A-** | ↑ (Run #4 git-shell validation, lockout email flood, CSRF policy all verified; no new CRIT/HIGH; remaining gaps are operational/MED) | | |
| 293 | - | | Performance | B | **B-** | ↓ (Run #4 scan_jobs retention + pool permit + broadcast bounding verified; new HIGHs in previously unaudited cart checkout + page-view paths + scheduler integrity scan) | | |
| 294 | - | ||
| 295 | - | Net: 3 CRITICAL (vs Run #4: 4), 13 HIGH/SERIOUS (vs Run #4: 10), 11 MED, 9 MINOR/LOW. Two axes regressed because Run #5 reached previously-unaudited territory (Payments tip/cart/refund edges; Performance hot-path request loops) while Run #4 plan items themselves were correctly closed. The Storage regression is a *recurrence of the same shape* in a sibling handler — the chronic invariant-in-prose disease, fourth consecutive run. | |
| 296 | - | ||
| 297 | - | ## Critical / High Findings (fix before launch) | |
| 298 | - | ||
| 299 | - | 1. **[Storage — CRITICAL]** `routes/storage/uploads.rs:204-237` — `confirm_upload` calls `enqueue_scan_for(...)` and `update_item_scan_status(... Pending)` BEFORE the match arm rejects `Download`/`Insertion`/`MediaImage`/`MediaVideo` with `BadRequest`. A misrouted-but-valid `item_id` confirms flips that item's scan status to Pending, blocks `stream_url` for every fan, and leaks a scan-job row for an S3 key that's then deleted. | |
| 300 | - | 2. **[UX — CRITICAL]** `error.rs:216-264` + `templates/error.html` — `AppError::validation_fields(summary, [(field, msg), ...])` is consumed only by unit tests. `ErrorTemplate` has no `fields:` member; no template renders per-field highlights. Every non-HTMX validation failure degrades to the global "Go Home / Go Back" page and wipes submitted form input. Handler authors are misled into thinking their carefully-tagged field errors reach the UI. | |
| 301 | - | 3. **[Perf — CRITICAL]** `build_runner.rs:175-180` — Partial-failure error message reports `("{}/{} succeeded", artifact_keys.len(), artifact_keys.len() + 1)`. Denominator is always `succeeded + 1`, regardless of how many targets actually ran. Three targets, one succeeded, two failed → reports "1/2" (should be 1/3). Failed-target count is never tracked. | |
| 302 | - | ||
| 303 | - | ### HIGH / SERIOUS | |
| 304 | - | ||
| 305 | - | 4. **[Payments — SERIOUS]** `db/transactions.rs:699-716` — `refund_transaction_by_payment_intent` returns `Vec<(TransactionId, ItemId)>` (non-Optional). Project-level transactions store `item_id IS NULL` (`routes/stripe/checkout/project.rs:135`). On `charge.refunded` for a project-level purchase, sqlx fails to decode NULL → `ItemId`; webhook handler 5xx's; Stripe retries forever. | |
| 306 | - | 5. **[Payments — SERIOUS]** `routes/stripe/webhook/checkout_helpers.rs:240-269` — `compute_splits` comment says "Defensive clamp: a misconfigured project_members row could sum past 100%" but the loop only adds remainder pennies and never subtracts. Two members at 60%+60% on $10 each are credited $6 each — $12 of $10 of revenue. Clamp only affects `expected_total`, never the already-computed per-member amounts. Tests cover ≤100% only. | |
| 307 | - | 6. **[Payments — SERIOUS]** `routes/stripe/checkout/tips.rs:104-106` — `TipForm.project_id` is taken verbatim from the form. The webhook later calls `record_tip_splits(tip.id, tip.project_id, ...)` and credits THAT project's members. An attacker tipping creator A can pass project B's UUID; B's members get split obligations credited against A's tip. Stripe money flows correctly; on-platform `tip_splits` records and any downstream reporting are corrupted. | |
| 308 | - | 7. **[Payments — SERIOUS]** `db/cart.rs:94-123` + `routes/stripe/checkout/cart.rs` — `item.rs:47-49` enforces "Unlisted items can only be obtained through their bundle" via `if !item.listed`. `toggle_cart_preflight` and `get_cart_items` check `is_public` but NOT `listed`. An attacker who knows an unlisted item's UUID can POST to `/api/cart/{id}/toggle` and check out via the cart flow, fully bypassing the bundle-only gate. | |
| 309 | - | 8. **[Payments — SERIOUS]** `routes/stripe/webhook/subscriptions.rs:117-121, 67-69, 95-96` — `status_str.parse::<SubscriptionStatus>()` returns BadRequest for any status not in `enums.rs:183-198` (Stripe's `paused` is new). Webhook handler returns Err; scheduler retries forever until status changes. | |
| 310 | - | 9. **[Payments — SERIOUS]** `payments/webhooks.rs:294-308` — `is_full_refund` returns true when `amount_refunded >= amount` and both are zero (Stripe sometimes emits these for $0 verification charges). Triggers `refund_transaction_by_payment_intent` with default `unknown` intent ID. Test at line 517-525 pins the behavior. | |
| 311 | - | 10. **[Storage — HIGH]** `routes/storage/versions.rs:159-174` — `version_confirm_upload` enqueues scan and flips `scan_status` to Pending BEFORE the `version.s3_key == req.s3_key` idempotency check at line 172. Duplicate retry of an already-confirmed upload knocks a Clean version back to Pending, breaking downloads. | |
| 312 | - | 11. **[Storage — HIGH]** `routes/storage/images.rs:179-208` — `project_image_confirm` replace branch is gated on `Ok(Some(old_size))` from `s3.object_size(&old_key)`. On `Err` (S3 hiccup) or `Ok(None)` (URL with no object behind it) it falls into the "no old image" branch, `try_increment_storage` without decrementing. Permanent storage over-count. Also: `update_project_image_url` runs AFTER `enqueue_deletions` of the old key, with no rollback path. | |
| 313 | - | 12. **[Storage — HIGH]** `routes/storage/media.rs:236-293` — `media_confirm` does three separate writes (`try_increment_storage`, `remove_pending_upload`, `media_files::create`) outside a transaction. Interruption between steps leaves S3 object orphaned with storage credit consumed and no DB row. | |
| 314 | - | 13. **[UX — HIGH]** `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227` — `let price_cents = (price_dollars * 100.0).round() as i32; if price_cents > 0 { validate_price_cents(price_cents)?; }`. Guard skips validation for 0 and negative values; value goes through `PriceCents::from_db` (no validation) into `update_item`. Submitting `price=-5` writes `-500` cents. Same pattern on PWYW: no `min <= suggested` check. | |
| 315 | - | 14. **[UX — HIGH]** `routes/pages/dashboard/wizards/item/save.rs:179-183` + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298` — `price_dollars: f64 = …parse()…unwrap_or(0.0)`. `"NaN".parse::<f64>()` succeeds; `NaN as i32 == 0` (silent Free). `1e20` saturates `i32::MAX`. Bulk path catches via `PriceCents::new` cap; `save.rs` does not — persists raw. | |
| 316 | - | 15. **[UX — HIGH]** `routes/auth.rs:356-361` — `let is_taken = db::users::get_user_by_username(...).await.map(|u| u.is_some()).unwrap_or(false);`. Transient DB error during signup live-check returns "available", misleading the user; subsequent signup races whatever real state the DB is in. | |
| 317 | - | 16. **[Perf — HIGH]** `routes/stripe/checkout/cart.rs:68-248` — Per cart item: sequential `has_purchased_item`, optional `remove_from_cart`, per-free-item `begin tx → claim_free_item → increment_sales_count → commit`, `get_item_by_id`, second `remove_from_cart`. 20-item cart ≈ 80 sequential roundtrips, ~20 separate transactions, 20 distinct pool acquisitions in series. | |
| 318 | - | 17. **[Perf — HIGH]** `db/page_views.rs:18-32` — `record_view` spawned per public request, takes a pool connection to UPSERT. With `DB_POOL_MAX_CONNECTIONS = 25`, a viral item link spawns unbounded tasks, eats the pool, times out real request handlers at acquire. No batching, no per-(target,session) debounce. | |
| 319 | - | 18. **[Perf — HIGH]** `scheduler/integrity.rs:53-73` — `check_sales_count_drift`: `SELECT i.id, i.sales_count, COUNT(t.id) FROM items LEFT JOIN transactions ... GROUP BY i.id HAVING i.sales_count != COUNT(t.id) LIMIT 50`. `HAVING` post-aggregation; Postgres scans every row in `items` and joins every completed transaction in history before filtering. `LIMIT 50` doesn't cap the work. Weekly multi-minute query holding a pool connection. | |
| 320 | - | ||
| 321 | - | ## Scorecard | |
| 322 | - | ||
| 323 | - | ### Axis Summary Grades | |
| 324 | - | ||
| 325 | - | | Axis | Overall | Cold Spots | Mandatory Surprise | | |
| 326 | - | |------|---------|------------|--------------------| | |
| 327 | - | | Payments | B | `routes/stripe/checkout/cart.rs` (B-), `routes/stripe/checkout/tips.rs` (B-), `db/transactions.rs` (B-), `routes/stripe/webhook/checkout_helpers.rs` (B-), `routes/stripe/webhook/subscriptions.rs` (B) | `compute_splits` carries a "Defensive clamp" comment that explicitly anticipates the >100% case and then fails to defend against it — only `expected_total` is clamped, the already-computed per-member splits go unchanged. Treat as evidence the defensive-comment culture is itself unreliable; comments and code drift independently. | | |
| 328 | - | | Storage | B- | `routes/storage/uploads.rs` (C+), `routes/storage/images.rs` (C+), `routes/storage/versions.rs` (C+), `routes/storage/media.rs` (B-), `db/mod.rs::check_sandbox_cap` (C+) | `stream_url` (`downloads.rs:119-122`) computes presigned expiry as `((duration as u64) * 2).max(3600)` where `duration: i32` and no DB CHECK ≥ 0 exists on `duration_seconds`. A negative value becomes near-`u64::MAX` expiry — a centuries-long presigned URL. The cast width and missing CHECK are independent latent bugs that compose into a multi-decade credential leak. | | |
| 329 | - | | UX Wiring | B | `routes/pages/dashboard/wizards/item/save.rs` (B-), `error.rs` (B-), `routes/pages/public/discover.rs` (B) | `update_item` takes ~13 positional `Option`s; call sites are unreadable and error-prone. The negative-price bug (HIGH #13) is born from this signature: anyone calling it has no compiler help distinguishing `Some(-500)` (bug) from `Some(500)` (intent). | | |
| 330 | - | | Security | A- | `helpers.rs` (B+), `scanning/clamav.rs` (B), `scanning/yara.rs` (B), `rate_limit.rs` (B+) | The "11 layer" scan pipeline test gives a false sense of coverage. ClamAV is `FailOpen` by explicit policy (`scanning/clamav.rs:19`), YARA silently skips rule files that fail to compile (`scanning/yara.rs:54-67`), and there is no startup assertion that any real AV layer is live. A misconfigured deploy can pass EICAR as Clean while the test suite is green. | | |
| 331 | - | | Performance | B- | `routes/stripe/checkout/cart.rs` (C), `scheduler/announcements.rs` (C+), `scheduler/integrity.rs` (C+), `scheduler/cleanup.rs` (B-), `build_runner.rs` (B-), `db/page_views.rs` (C+), `db/pending_s3_deletions.rs` (B) | The biggest scaling cliff is a 1-line `tokio::spawn` on the page-view path, not anything that "looks expensive". Hot-path response shipped its tail-latency problem to the same pool that serves it. | | |
| 332 | - | ||
| 333 | - | ## Bug Counts by Severity | |
| 334 | - | ||
| 335 | - | | Severity | Payments | Storage | UX | Security | Perf | Total | | |
| 336 | - | |---|---|---|---|---|---|---| | |
| 337 | - | | CRITICAL | — | 1 | 1 | — | 1 | **3** | | |
| 338 | - | | HIGH/SERIOUS | 5 | 3 | 3 | — | 3 | **14** | | |
| 339 | - | | MED | 2 | 3 | 2 | 4 | 2 | 13 | | |
| 340 | - | | MINOR/LOW | 2 | 2 | 2 | 3 | 1 | 10 | | |
| 341 | - | ||
| 342 | - | ## Cross-Cutting Concerns | |
| 343 | - | ||
| 344 | - | 1. **Side-effects-before-validation pattern.** Storage (uploads/versions/images route gates run after scan enqueue), Payments (tip `project_id` accepted before authorization, cart `listed` not checked before checkout), UX (price `from_db` after a guard that skips zero/negative). Four files, three axes, same shape: persist first, validate later. | |
| 345 | - | 2. **Invariant-in-prose, fourth consecutive run.** Run #2→#3 was MaybeUser; Run #3→#4 was scan_status ordering comments-vs-code; Run #4 partial fix landed (`images.rs`) but the same disease moved up a layer to `uploads.rs` (the route-level file-type gate now runs after scan enqueue). The Payments "defensive clamp" comment in `compute_splits` is the same shape on a different organ. **No type-level constructive impossibility has yet been applied to any of these.** | |
| 346 | - | 3. **Optional positional args as bug carriers.** `update_item`'s ~13 positional `Option`s let the wizard pass a negative-price `Option<PriceCents::from_db>` past the validator. Same pattern is implicated in the UX field-error finding — `ErrorTemplate`'s struct literal is missing a `fields:` field at every callsite and the compiler doesn't care. | |
| 347 | - | 4. **Hot-path pool pressure from fire-and-forget writes.** `record_view` per pageview, `tokio::spawn` per cart line, scheduler advisory-lock conn pinned across S3. The 25-connection pool is sized for a quiet box; three independent fan-out patterns can each saturate it. | |
| 348 | - | 5. **FailOpen with no liveness assertion.** ClamAV FailOpen + YARA optional + no startup gate = a green test suite can coexist with zero real AV coverage. Same shape as the Performance "spawned task accumulates without bound" pattern — both are silent degradations the operator never sees. | |
| 349 | - | ||
| 350 | - | ## Components Successfully Stress-Tested | |
| 351 | - | ||
| 352 | - | - All Run #4 Phase 1 closures verified standing (CSRF creator-tier token, `images.rs` scan_status ordering structural fix, git-shell validation, lockout `=` predicate, promo dedupe, scanner streaming + pool permit, broadcast bounded fan-out, scan_jobs retention). | |
| 353 | - | - Stripe HMAC: multi-secret `v1=` rotation now accepts on any match (Run #4 polish landed). | |
| 354 | - | - Promo `try_increment_use_count` race-free via atomic single-row UPDATE; release path uses detach for no-double-decrement; proptest-covered. | |
| 355 | - | - License keys: 66-bit entropy, DB UNIQUE, `FOR UPDATE` activation, full recount on revoke (display lag only — finding #M). | |
| 356 | - | - CSRF posture: `CsrfRouter<S>` newtype prevents a bare `Router::route(path, post(...))` from compiling in mutation-bearing files. Verified. | |
| 357 | - | - Argon2id parameters + `DUMMY_HASH` timing equalization on user-not-found (login, OAuth, SyncKit). | |
| 358 | - | - PKCE-S256 pinned at both authorize and token endpoints; OAuth code atomic single-use consume. | |
| 359 | - | - JWT future-iat rejection + `jwt_invalidated_at` second-equal `<=` semantics; password change bumps `jwt_invalidated_at` via `update_user_password`. | |
| 360 | - | - SSE shard-guard drop-before-remove; cross-process advisory locks for scheduler ticks. | |
| 361 | - | - ZIP bomb: decompressed-bytes counted (not claimed); ratio + depth caps; nested magic-byte detection. | |
| 362 | - | - `try_increment_storage` cap-predicate UPDATE; concurrent uploads cannot both squeeze past cap. | |
| 363 | - | ||
| 364 | - | ## Confidence Per Axis | |
| 365 | - | ||
| 366 | - | - Payments **HIGH** — read 22 of 23 listed files end-to-end with targeted attacks per surface; all four SERIOUS reproducible by line-tracing. | |
| 367 | - | - Storage **HIGH** — CRITICAL and all three HIGHs mechanically reproducible; mandatory surprise composes two latent bugs via line-by-line read. | |
| 368 | - | - UX Wiring **HIGH** — full read of `csrf.rs`, `error.rs`, `markdown.rs`, `formatting.rs`, `validation/mod.rs`; spot-checked 20+ templates for CSRF pattern; CRITICAL field-aware-validation finding cross-checked by grepping `validation_fields_ref` callers. | |
| 369 | - | - Security **MEDIUM** — auth/CSRF/OAuth/scanning surfaces walked thoroughly; admin/moderation/reports/ssh_keys API/totp routes only sampled. ClamAV FailOpen is **policy** not bug; flagged as architectural risk. | |
| 370 | - | - Performance **MEDIUM-HIGH** — spot-checked DB call patterns across 15+ files; exhaustive route-level N+1 sweep deferred; stripe/webhook code shows similar `for x in &xs` loops at `checkout.rs:149,167,198,452` that were not deep-audited. | |
| 371 | - | ||
| 372 | - | ## Metrics | |
| 373 | - | ||
| 374 | - | - Modules audited: ~80 | |
| 375 | - | - Cold spots (≤ B): 18 | |
| 376 | - | - Bugs: 3 CRITICAL, 14 HIGH/SERIOUS, 13 MED, 10 MINOR/LOW | |
| 377 | - | - Axes at A- or above: 1/5 (Security) | |
| 378 | - | ||
| 379 | - | ## Delta Since Run #4 | |
| 380 | - | ||
| 381 | - | **FIXED (Run #4 items not surfaced this run):** | |
| 382 | - | - All 10 Run #4 Phase 1 items verified closed (CSRF creator-tier, `images.rs` ordering, git-shell validation, lockout email flood, cancel_pending CSRF, promo dedupe, scanner streaming + pool permit, scan_jobs retention, broadcast bounding). | |
| 383 | - | - All 7 Run #4 Phase 2 items verified closed (cart template price math, media reupload race, pending_uploads reaper bump, TOTP step-replay, delete_other_sessions cache eviction, `/login` CSRF, OAuth fetch_optional). | |
| 384 | - | - All 5 Run #4 Phase 3 items verified closed (claim_pending_build partial index, build status reaper race, `extract_s3_key_from_url` host pinning, TOTP `pending_2fa` tracking row, KNOWN_SYNC_APPS removed entirely). | |
| 385 | - | - All Phase 4 polish items verified closed. | |
| 386 | - | ||
| 387 | - | **NEW CRITICAL/HIGH in Run #5 (previously unaudited or regressed):** | |
| 388 | - | - Storage: `uploads.rs` route-level file-type gate runs after scan enqueue (CRIT). | |
| 389 | - | - UX: `validation_fields` plumbing is dead code at template boundary (CRIT). | |
| 390 | - | - Perf: `build_runner.rs` partial-failure denominator nonsense (CRIT). | |
| 391 | - | - Payments: NULL `item_id` decode bomb on project-level refunds (SERIOUS). | |
| 392 | - | - Payments: `compute_splits` over-credits when project_members sum >100% (SERIOUS). | |
| 393 | - | - Payments: tip `project_id` not validated vs recipient (SERIOUS). | |
| 394 | - | - Payments: cart bypasses item `listed` gate (SERIOUS). | |
| 395 | - | - Payments: unknown subscription status retry storm (SERIOUS). | |
| 396 | - | - Storage: `version_confirm_upload` scan enqueue before idempotency check (HIGH). | |
| 397 | - | - Storage: `project_image_confirm` mis-accounts on S3 probe failure + no rollback (HIGH). | |
| 398 | - | - Storage: `media_confirm` non-atomic three-write sequence (HIGH). | |
| 399 | - | - UX: negative/NaN price acceptance via `PriceCents::from_db` after permissive guard (HIGH). | |
| 400 | - | - UX: username availability check fails open on DB error (HIGH). | |
| 401 | - | - Perf: cart checkout 80 sequential roundtrips (HIGH). | |
| 402 | - | - Perf: `record_view` unbounded spawn per public request (HIGH). | |
| 403 | - | - Perf: `check_sales_count_drift` full-table aggregate (HIGH). | |
| 404 | - | ||
| 405 | - | **CHRONIC (across Run #3 → Run #4 → Run #5):** | |
| 406 | - | - **Invariant-in-prose / policy-not-in-types — FOURTH consecutive run.** Run #4 partially fixed the scan_status ordering inside `images.rs` (and the CSRF policy via `CsrfRouter` structurally), but the same disease *moved up a layer*: in `uploads.rs` the route-level file-type gate now runs *after* scan enqueue. The constructive-impossibility shape needed: extract a `commit_upload(file_type, ...)` higher-level operation that validates the file_type before doing any scan/credit side effects, then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` so handlers cannot call them directly. The Payments `compute_splits` "Defensive clamp" comment + the UX `validation_fields_ref` orphan plumbing are the same disease in different organs. | |
| 407 | - | ||
| 408 | - | **REGRESSED:** | |
| 409 | - | - Payments (A- → B) — four new SERIOUS bugs surfaced in previously-unaudited tip/cart/refund/subscription-status corners. Not a regression in fixed code; a regression in audit coverage. | |
| 410 | - | - Storage (A- → B-) — invariant-in-prose recurrence (chronic above). | |
| 411 | - | - Performance (B → B-) — hot-path request loops audited for the first time. | |
| 412 | - | ||
| 413 | - | --- | |
| 414 | - | ||
| 415 | - | # Plan: Restore Every Axis to A- or Higher (Run #5) | |
| 416 | - | ||
| 417 | - | **Target grades:** Payments A · Storage A · UX A- · Security A- · Performance A-. | |
| 418 | - | ||
| 419 | - | User priority for the launch window: **resolve every CRITICAL/SERIOUS/HIGH before re-running**. Iterate until audits surface only small new errors. | |
| 420 | - | ||
| 421 | - | ## Phase 1 — CRITICAL (fix today) | |
| 422 | - | ||
| 423 | - | 1. **Storage CRIT — `uploads.rs` file-type gate ordering.** `routes/storage/uploads.rs:204-237`. Move the match arm that rejects `Download`/`Insertion`/`MediaImage`/`MediaVideo` BEFORE `enqueue_scan_for` and `update_item_scan_status`. Then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` and expose a `commit_upload(file_type, item_id, s3_key)` higher-level op that performs validation → credit → row insert → status flip in the correct order. The same constructor must serve `versions.rs` and `images.rs`. This closes the chronic invariant-in-prose finding. | |
| 424 | - | 2. **UX CRIT — Field-aware validation reaches the UI.** `error.rs:216-264` + `templates/error.html` + `templates/partials/form_errors.html` (new). Either (a) add `fields: Vec<(String, String)>` to `ErrorTemplate` and a `{% for f in fields %}` block in `error.html` + per-input markup; or (b) delete `validation_fields*` API entirely and replace handler callsites with `validation(summary)`. Choose (a) for non-HTMX forms that need to preserve user input; choose (b) only if every existing callsite is HTMX-only and uses OOB swaps for inline errors. Audit all `validation_fields` callers and pick a path. | |
| 425 | - | 3. **Perf CRIT — `build_runner.rs` partial-failure denominator.** `build_runner.rs:175-180`. Track `failed_count` alongside `artifact_keys`; report `succeeded/(succeeded+failed)`. Add a test that runs 3 targets with 2 failures and asserts "1/3" in the error string. | |
| 426 | - | ||
| 427 | - | ## Phase 2 — SERIOUS / HIGH (fix this weekend) | |
| 428 | - | ||
| 429 | - | 4. **Payments SERIOUS — NULL item_id refund decode.** `db/transactions.rs:699-716`. Change return to `Vec<(TransactionId, Option<ItemId>)>`; `refund_transaction_by_payment_intent` caller skips `decrement_sales_count`/`revoke_keys_by_transaction` when `item_id is None`. Add a fixture-based test against a project-level transaction. | |
| 430 | - | 5. **Payments SERIOUS — `compute_splits` over-credit.** `routes/stripe/webhook/checkout_helpers.rs:240-269`. Reject `total_split_pct > 100` at the project_members write site (DB CHECK or validation). Defensively, scale each split proportionally when sum > 100, OR clamp each split against remaining `expected_total` budget in the loop. Add a test at 60%+60%. | |
| 431 | - | 6. **Payments SERIOUS — Tip project authorization.** `routes/stripe/checkout/tips.rs:104-106`. After accepting `TipForm`, fetch the project and assert `project.user_id == recipient_id`; return 400 otherwise. | |
| 432 | - | 7. **Payments SERIOUS — Cart bypasses `listed` gate.** `db/cart.rs:94-123` and `get_cart_items`/`get_cart_items_for_seller`. Add `AND i.listed = true` to all three queries. Add a check in the per-seller checkout path. Add a regression test that toggles an unlisted item into the cart and asserts rejection. | |
| 433 | - | 8. **Payments SERIOUS — Unknown subscription status.** `routes/stripe/webhook/subscriptions.rs:117-121`. Replace `?` with a match: known statuses dispatch; unknown statuses `tracing::warn!` and return `StatusCode::OK` so Stripe stops retrying. | |
| 434 | - | 9. **Payments SERIOUS — `is_full_refund` zero-amount.** `payments/webhooks.rs:294-308`. Predicate becomes `amount > 0 && amount_refunded >= amount`. Update the test at line 517-525 to invert (zero-amount must NOT be treated as full refund). | |
| 435 | - | 10. **Storage HIGH — `versions.rs` enqueue-before-idempotency.** `routes/storage/versions.rs:159-174`. Move idempotency `version.s3_key == req.s3_key` check BEFORE `enqueue_scan_for`. Apply the Phase 1 `commit_upload` helper here. | |
| 436 | - | 11. **Storage HIGH — `project_image_confirm` probe-failure + no rollback.** `routes/storage/images.rs:179-208`. (a) On `Err` or `Ok(None)` from `s3.object_size`, fall back to the row's recorded size (add a `project_image_bytes` column if not present) rather than the "no old image" branch. (b) Move `enqueue_deletions` to AFTER `update_project_image_url` success, or wrap both in a tx with the enqueue inside. | |
| 437 | - | 12. **Storage HIGH — `media_confirm` non-atomic three-write.** `routes/storage/media.rs:236-293`. Wrap `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction. The storage credit refund must fire on any failure path. | |
| 438 | - | 13. **UX HIGH — Negative/NaN prices via `from_db`.** `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227`. Use `PriceCents::new(price_cents)?` unconditionally; drop the `> 0` guard. Add `min <= suggested` check on PWYW. | |
| 439 | - | 14. **UX HIGH — f64 price parsing accepts NaN.** Same file + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298`. Parse as decimal cents directly (or `Decimal::from_str_exact` from the `rust_decimal` crate already in `Cargo.lock`); reject NaN/Inf; reject negative/saturating values before cast. | |
| 440 | - | 15. **UX HIGH — Username live-check fails open.** `routes/auth.rs:356-361`. Propagate the DB error or treat it as "unavailable, try again" — never "available" by default. | |
| 441 | - | 16. **Perf HIGH — Cart checkout sequential roundtrips.** `routes/stripe/checkout/cart.rs:68-248`. Bulk-load `has_purchased_item` once with `WHERE item_id = ANY($1)`. Batch `get_item_by_id` lookups. Claim free items in a single transaction with batched inserts. Aim for ≤ 5 roundtrips for any cart size. | |
| 442 | - | 17. **Perf HIGH — `record_view` unbounded spawn.** `db/page_views.rs:18-32`. Replace per-request spawn with an `mpsc` channel; one background task drains every 250ms and flushes one bulk `INSERT … ON CONFLICT … DO UPDATE SET view_count = page_view_daily.view_count + EXCLUDED.view_count`. | |
| 443 | - | 18. **Perf HIGH — Sales drift full-table aggregate.** `scheduler/integrity.rs:53-73`. Maintain trigger-updated `transactions_completed_count` per item, or run the check off-pool against a snapshot. Short term: add `WHERE i.sales_count > 0 OR EXISTS (SELECT 1 FROM transactions WHERE item_id = i.id LIMIT 1)` to drop the LEFT JOIN's all-zero rows from the aggregate. | |
| 444 | - | ||
| 445 | - | ## Phase 3 — MED (fix before re-run if cheap) | |
| 446 | - | ||
| 447 | - | - Storage: advisory-lock leak in `check_sandbox_cap` (`db/mod.rs:92-128`) → `pg_advisory_xact_lock` or RAII guard. | |
| 448 | - | - Storage: `is_s3_key_live` missing tables (`db/pending_s3_deletions.rs:67-82`) → audit all s3_key-bearing columns; consider normalized `s3_objects` table. | |
| 449 | - | - Storage: `delete_version` owner SELECT outside tx + post-commit S3 enqueue (`db/versions.rs:267-315`) → owner SELECT inside tx; enqueue inside tx. | |
| 450 | - | - Security: ClamAV `FailOpen` startup assertion (`scanning/clamav.rs:19` + `scanning/mod.rs:151-164`) → refuse boot if scan configured but no AV layer live; emit `tracing::error!` after N consecutive ClamAV errors. | |
| 451 | - | - Security: `helpers.rs:44-50` `DefaultHasher` for advisory lock keys → stable hasher (`sha2` first 8 bytes, or `xxh3` with constant seed). | |
| 452 | - | - Security: OAuth `state` size cap (`routes/oauth.rs:379-386`) → reject `form.state.len() > 1024`; cap `code_challenge` at 44 base64url chars. | |
| 453 | - | - Security: `extract_client_ip` non-Cloudflare fallback warning (`helpers.rs:33-40`) → emit one-shot `tracing::warn!` at startup if no `CF-Connecting-IP` seen after N requests. | |
| 454 | - | - UX: pagination offset overflow (`routes/pages/public/discover.rs:85-87`, `routes/admin/users.rs:37-39`) → clamp `page` to `total_pages.max(1)` before arithmetic. | |
| 455 | - | - UX: forms render without `_csrf` when handler forgets to populate `csrf_token` → make `csrf_token` non-optional in form-bearing templates (compile-time error) or render an inline "refresh and try again" notice. | |
| 456 | - | - UX: `validate_username` byte-length check (`routes/auth.rs:322`) → `chars().count()`, or reorder ASCII filter before length. | |
| 457 | - | - Perf: scheduler advisory-lock connection pinned across S3 (`scheduler/mod.rs:92-279`) → dedicated `PgPoolOptions::new().max_connections(1)` outside the main pool. | |
| 458 | - | - Perf: cleanup S3 deletes serialized inside scheduler tick (`scheduler/cleanup.rs:77-100`) → `for_each_concurrent(8, ...)`; better, move user-deletion off the scheduler tick. | |
| 459 | - | ||
| 460 | - | ## Phase 4 — Polish (after re-run shows axes ≥ A-) | |
| 461 | - | ||
| 462 | - | - Payments: `has_active_subscription_to_item` period-end clause mirroring (`db/subscriptions.rs:464-470`). | |
| 463 | - | - Payments: `get_active_creator_tier` + `sync_user_creator_tier` period-end defense (`db/creator_tiers.rs:91-103, 181-194`). | |
| 464 | - | - Payments: `release_use_count` race messaging (`db/promo_codes.rs:184-200`). | |
| 465 | - | - Payments: License key `activation_count` recount on revoke (`db/license_keys.rs:343-382`). | |
| 466 | - | - Payments: Subscription minimum-charge check (`payments/checkout.rs:283-317`). | |
| 467 | - | - Payments: Webhook v1/v2 unmark-on-failure parity (`routes/stripe/webhook/mod.rs:48-86`). | |
| 468 | - | - Storage: `media_files.list_folders` scan filter (`db/media_files.rs:73-82`). | |
| 469 | - | - Storage: `pending_uploads.record_pending_upload` silent user-mismatch (`db/pending_uploads.rs:23-33`). | |
| 470 | - | - Storage: `append_log_bounded` non-atomic size cap (`build_runner.rs:516-534`). | |
| 471 | - | - Storage: `downloads.rs:119-122` presigned-URL expiry: cap `duration_seconds` at i64 + add DB CHECK ≥ 0. | |
| 472 | - | - Security: `validate_token_consuming` for OAuth POST (`routes/oauth.rs:206`). | |
| 473 | - | - Security: `parse_repo_path` rejects lone-dot entries (`git_ssh.rs:162`). | |
| 474 | - | - Security: ClamAV INSTREAM 16K cap → treat truncation as fail-closed (`scanning/clamav.rs:101-108`). | |
| 475 | - | - UX: validation error messages stop reflecting user input (`wizards/item/mod.rs:176-179`). | |
| 476 | - | - UX: CSRF body extraction stops using `from_utf8_lossy` (`csrf.rs:528-543`). | |
| 477 | - | - Perf: scan-pipeline 400 MiB worst-case capacity-plan note (`constants.rs:156-157`). | |
| 478 | - | - Perf: announcement fan-out persistence + resume (`scheduler/announcements.rs:59-89, 147-177`). | |
| 479 | - | - Perf: build log per-line DB roundtrip (`build_runner.rs:516-534`) → in-process running total. | |
| 480 | - | ||
| 481 | - | ## Phase 5 — Chronic (must land in Run #6 or this audit cycle has failed) | |
| 482 | - | ||
| 483 | - | **Invariant-in-prose / policy-not-in-types, fourth consecutive run.** The Phase 1 #1 fix (constructive `commit_upload` helper sealing the lower-level ops) is the only acceptable resolution. Memory notes, comments warning future authors, and renamed-helper approaches have been tried in three prior runs and recurred each time. After Phase 1 lands, audit `compute_splits` and `ErrorTemplate` for the same shape and apply the same treatment. | |
| 484 | - | ||
| 485 | - | --- | |
| 486 | - | ||
| 487 | - | ||
| 488 | - | ||
| 489 | - | ## Headline | |
| 490 | - | ||
| 491 | - | | Axis | Run #3 | Run #4 | Direction | | |
| 492 | - | |------|--------|--------|-----------| | |
| 493 | - | | Payments | A- | **A-** | flat (1 new SERIOUS: promo over-release on cart cleanup) | | |
| 494 | - | | Storage | B+ | **A-** | ↑ (Run #3 image-confirm rollback/race-guard fixes verified; one residual CRIT in same file) | | |
| 495 | - | | UX Wiring | B+ | **C+** | ↓ (CSRF policy patchwork: missing tokens + undocumented mutation in exempt prefix) | | |
| 496 | - | | Security | B+ | **B+** | flat (different HIGHs: git-shell repo-name validation + lockout DoS) | | |
| 497 | - | | Performance | B- | **B** | ↑ (Run #3 sync-FS-in-async + DashMap shard-lock + monitor split all verified; new unbounded scan_jobs/broadcast/pool-permit findings) | | |
| 498 | - | ||
| 499 | - | Net: 4 CRITICALs (vs Run #3: 2), 10 HIGH/SERIOUS (vs Run #3: 10), 22 MED, 23 MINOR/LOW. Ship-blockers are concentrated in two structural rots — CSRF policy and scan_jobs growth — not in net-new logic mistakes. | |
| 500 | - |
Lines truncated
| @@ -1,373 +0,0 @@ | |||
| 1 | - | # MNW Server — Todo | |
| 2 | - | ||
| 3 | - | **Last updated:** 2026-05-31 late evening (post Run #9 — launch-eve pass). | |
| 4 | - | ||
| 5 | - | ## Status | |
| 6 | - | ||
| 7 | - | All 5 axes at A- after Run #9 fixes. **0 CRITICAL open · 1 SERIOUS open (deferred) · 3 HIGH open (deferred) · 7 MED open (deferred).** Launchplan §1.5 A- bar holds. See `docs/audit_review.md` Run #9 section for full triage. | |
| 8 | - | ||
| 9 | - | ## Run #9 — fixed this session (2026-05-31) | |
| 10 | - | ||
| 11 | - | - **UX-CRITICAL** Signup TOCTOU 23505 → 500 + form loss. `join_wizard.rs`: catch 23505 with constraint-name routing, surface as `return_error`. Follow-up: preserve typed form fields on error swap (Phase 4). | |
| 12 | - | - **Sec-SERIOUS** `delete_all_sessions_for_user` non-atomic JWT bump → wrapped in `pool.begin()` / `tx.commit()` (`db/sessions.rs:247`). | |
| 13 | - | - **Sec-SERIOUS** 2FA login-email IP spoofable via bare `x-forwarded-for` → swapped to `crate::helpers::extract_client_ip` (`routes/pages/public/two_factor.rs:308`). | |
| 14 | - | - **Pay-SERIOUS** Webhook dual-failure 503 short-circuited on Stripe retry → call `unmark_event_processed` before returning 503 (`routes/stripe/webhook/mod.rs:81`). | |
| 15 | - | ||
| 16 | - | `cargo check --tests` clean; targeted unit tests (sessions/webhook/two_factor/join_wizard) 33/33 green. Full DB-integration suite needs astra postgres. | |
| 17 | - | ||
| 18 | - | ## Run #9 — deferred with rationale (Phase 4) | |
| 19 | - | ||
| 20 | - | - [ ] **Pay-SERIOUS** Subscription webhook out-of-order events resurrect `active`. Needs `created`-timestamp re-extraction from `UntypedEvent` + `WHERE last_event_at <= $created` guards across Fan+/creator-tier/synckit subscription writes. Cross-cutting; worst case is minutes-window of restored access until next webhook. | |
| 21 | - | - [ ] **Sto-HIGH** Migration 129 dead-letter table never written (`cleanup.rs:453`). Operational visibility, not runtime; one-INSERT fix. | |
| 22 | - | - [ ] **Perf-HIGH** Per-request `reqwest::Client::new()` in 5 hot paths (dashboard/main, public/landing, api/internal/cli_features, api/domains, auth.rs). Hoist to OnceLock or AppState pooled client. | |
| 23 | - | - [ ] **Perf-HIGH** Unbounded `tokio::spawn` in `cleanup.rs:215-220` `spawn_expired_account_cleanups`. Lift existing `CLEANUP_PARALLELISM=4` JoinSet pattern from `cleanup_sandbox_accounts` 100 lines above. | |
| 24 | - | - [ ] **Pay-MED** `pricing.rs::parse_dollars_to_cents` strips European decimal comma; `1,23` → 12300¢. | |
| 25 | - | - [ ] **Pay-MED** SyncKit app-sub checkout silently defaults `storage_limit_bytes` to 0 if metadata missing. | |
| 26 | - | - [ ] **Pay-MED** Guest checkout email sentinel `"unknown@guest"` collision risk. | |
| 27 | - | - [ ] **Sto-MED** `is_s3_key_live` 7 EXISTS subqueries on unindexed s3_key columns — sequential scans per retry. Add partial indexes WHERE NOT NULL. | |
| 28 | - | - [ ] **Sto-MED** `is_s3_key_live` LIKE suffix `'%' || s3_key` false-positives on neighboring keys → S3 object leaks. Anchor with `/`. | |
| 29 | - | - [ ] **UX-MED** `purchase.html:145` `?return_to=` dead-wired; login handler always redirects `/dashboard`. | |
| 30 | - | - [ ] **UX-MED** Admin user filter buttons (`admin-users.html:35-44`) use `class="primary"` instead of `btn-primary` — renders unstyled. | |
| 31 | - | ||
| 32 | - | ## Run #9 — LOW/NOTE (carry forward) | |
| 33 | - | ||
| 34 | - | - [ ] **UX-LOW** Pagination links in `git/issues.html:72,76` don't URL-encode `search` param. | |
| 35 | - | - [ ] **UX-LOW** 5 sites use `.render().unwrap_or_default()` on Askama templates — blank UI on render failure, no log line. | |
| 36 | - | - [ ] **UX-LOW** `slugify` (`formatting.rs:85`) produces `"post"` for any non-ASCII title. | |
| 37 | - | - [ ] **Sec-MINOR** `csrf.rs:176-185` `validate_token_consuming` doesn't actually consume — rename or rotate. | |
| 38 | - | - [ ] **Sec-MINOR** `routes/oauth.rs:101-111` `is_localhost_redirect` allows any port regardless of registered URI. | |
| 39 | - | - [ ] **Sec-MINOR** `scanning/archive.rs:124` path-traversal check misses lone `..` segment (no trailing `/`). | |
| 40 | - | - [ ] **Perf-LOW** `db/page_views.rs` `pending` HashMap has no max-cardinality cap. | |
| 41 | - | - [ ] **Perf-LOW** `build_runner.rs:441` artifact tmpfile leaks if process crashes between SCP and `remove_file`. | |
| 42 | - | ||
| 43 | - | Live state: working tree has 104+ Run #8 files plus 4 Run #9 files (`join_wizard.rs`, `sessions.rs`, `two_factor.rs`, `webhook/mod.rs`, `docs/audit_review.md`, `todo.md`). | |
| 44 | - | ||
| 45 | - | ## Open before launch (Monday 2026-06-01) | |
| 46 | - | ||
| 47 | - | ### Platform-as-product audits (skill-driven, code-review scope; fresh context recommended) | |
| 48 | - | - [ ] `/creator-fuzz` — would a working creator trust this with their livelihood? | |
| 49 | - | - [ ] `/use-fuzz` — discoverability, learnability, first-five-minutes | |
| 50 | - | - [ ] `/business-fuzz` — pricing copy, fee surfacing, refund-policy wording vs actual platform behaviour | |
| 51 | - | ||
| 52 | - | ### Per-project hygiene (manual, my call when ready) | |
| 53 | - | - [ ] README first-screen audit — what is this / who is it for / where to get it / what does it cost. No headliner paragraphs. | |
| 54 | - | - [ ] `Cargo.toml` version bump for the launch deploy (pick the number; I do the edit if needed) | |
| 55 | - | - [ ] CHANGELOG entry for the launch version | |
| 56 | - | ||
| 57 | - | ### Monday browser/prod testing (saved for Monday per current direction) | |
| 58 | - | - [ ] §1.1 Walk every public page: footer present, OG/Twitter meta render correctly in Facebook + Twitter debuggers, error pages render via forced 404/403/500 | |
| 59 | - | - [ ] §1.2 First-run creator flow end-to-end in production: signup → Stripe Connect → first item upload | |
| 60 | - | - [ ] §1.3 Each seeded creator's `/{handle}` page renders without empty sections; sample item per medium (audio/video/text/download) reachable from `/discover` | |
| 61 | - | - [ ] §1.4 Production deploy of post-fuzz build + version recorded via `record_deploy`; scheduled jobs running on prod (cleanup, scan jobs retention, build reaper, broadcast fan-out); Stripe webhook reachable from dashboard ping; backup snapshot taken pre-launch + restoration path documented in `_private/docs/mnw/server-docs/`; `/health` green | |
| 62 | - | - [ ] §5 launch-day sequence: final deploy, smoke-test logged-out from non-dev machine, update bios/link-in-bio/handles, confirm `maxj.phd` resolves, tag launch commit (`git tag launch-2026-06-01`) | |
| 63 | - | ||
| 64 | - | ## Open question for the user (action before Monday) | |
| 65 | - | ||
| 66 | - | - [ ] **Confirm all role-based email addresses route to real mailboxes**: `info@`, `security@`, `dmca@`, `privacy@`, `dpo@`, `legal@`, `billing@`, `policy@`, `reports@`, `community@`, `appeals@`, `press@`, `noreply@`. Legal pages (terms, privacy, copyright, appeals) and several role-routed flows reference them. If any are aspirational, that's a launch risk for the legal pages and an inbound-mail blackhole. Verify with Postmark/forwarding setup. | |
| 67 | - | ||
| 68 | - | ## Deferred with rationale (no action; documented) | |
| 69 | - | ||
| 70 | - | - [ ] `build_runner.rs:151` serial-target loop. LOW; builds run rarely; refactor touches denominator + error aggregation + log order. Post-launch. | |
| 71 | - | - [ ] `scheduler/mod.rs:92-279` advisory-lock per-tier granularity. Multi-replica concern; defer until multi-replica is real. | |
| 72 | - | - [ ] Drop unused `completion_effects` table (migration cleanup, schema-only). | |
| 73 | - | - [ ] Templatize founder + standard annual prices in `tiers.md` and `pricing.md` (e.g. `$86/yr`, `$130/yr`, `$194/yr`, `$324/yr`; standard `$173/$259/$389/$648`). docengine substitutions don't support arithmetic; would require adding derived `tiers.founding.basic_annual` etc keys in `shared/docengine/src/assumptions.rs`. Not blocking. | |
| 74 | - | - [ ] `_head_assets.html` apple-touch-icon + manifest link wiring. `static/manifest.json` exists but the `<link rel="manifest">` was reverted; bring back if/when desired. | |
| 75 | - | - [ ] Migrate footer's `What's new` and `Shortcuts` `<a href="#" onclick="...">` to `data-*` attributes following the `data-copy-link` pattern. UX MED, not blocking. | |
| 76 | - | ||
| 77 | - | ## What's done this session (compact summary, full details below) | |
| 78 | - | ||
| 79 | - | - **Ultra Fuzz Run #8** — all 5 axes A-. SERIOUS webhook unbounded spawn closed via new `src/background.rs` (bounded mpsc + semaphore-bounded concurrent execution). `spawn_email!` macro migrated; 17 callers + 5 manual webhook spawns + 5 same-disease per-request email spawns now route through bg queue. Run #8 5 new MEDs all closed (cart `min_price_cents`, cart-all chain-break, item-wizard `pricing_model`, inline-JS templates, cart free-claim N+1). Previously-deferred Payments H2 `claim_free_project` race closed. | |
| 80 | - | - **7-wave backlog sweep** — 24 of 26 carried items across auth/security/scanning/db/storage/UX/perf/payments. New schema migration `133_items_duration_seconds_nonnegative.sql`. New `commit_rescan` helper extends chronic-disease seal to admin paths. Two LOW items deferred above. | |
| 81 | - | - **4 cross-cutting sweeps** — `info@makenot.work` email pin (8 files), localhost/TODO/emoji/secret scans all clean. | |
| 82 | - | - **§1.1 public-surface code work** — OG + Twitter card meta in `base.html` (per-page overridable blocks), `static/manifest.json` created with brand colours, `error.html` drops broken back button + adds contact link, `Contact` link added to footer (mailto:info@), new `routes/pages/public/sitemap.rs` (with in-memory 10-min cache + LIKE-wildcard escape from the security review). | |
| 83 | - | - **Doc-fuzz** — `content-scanning.md` restructured (Malware checks + Authenticity checks sections, added URLhaus/MetaDefender/signing layers), `policy.html` See-also block linking 6 legal pages, `tiers.md` prose prices templatized via `{{ tiers.standard.* | int }}`. | |
| 84 | - | - **Exorcise** — 9 AI-tell removals across compare.md, content-scanning.md, appeals.md, faq.md. | |
| 85 | - | - **Nitpick** — 2 polish edits (dead `let _ = scan_status` removed, unused tuple-name destructure tidied). | |
| 86 | - | - **Security review** — 2 MEDs fixed inline: sitemap.xml in-memory cache to absorb crawler/attacker hammering; LIKE-wildcard escape on `is_s3_key_live` to prevent `_` in s3_keys from false-positive matching. | |
| 87 | - | ||
| 88 | - | --- | |
| 89 | - | ||
| 90 | - | ## Ultra Fuzz 2026-05-31 (Run #8 — final re-grade) | |
| 91 | - | ||
| 92 | - | ### Above-MED items to address before launch (or defer with rationale) | |
| 93 | - | ||
| 94 | - | ### New MED-tier findings (all closed 2026-05-31) | |
| 95 | - | ||
| 96 | - | All 5 MEDs landed. `cargo test --lib` 1654 / 0. | |
| 97 | - | ||
| 98 | - | ### Verified closed this run | |
| 99 | - | ||
| 100 | - | ### Storage A- standing — remaining MED/LOW (Phase 4 polish or defer) | |
| 101 | - | Carried from Storage code-fuzz 2026-05-31 — see below. All still MED, none A- blockers. | |
| 102 | - | ||
| 103 | - | --- | |
| 104 | - | ||
| 105 | - | ## Audit backlog sweep 2026-05-31 (post-Run #8, 7 waves) | |
| 106 | - | ||
| 107 | - | Sorted by file locality and difficulty. Tests: 1655 / 0 throughout. | |
| 108 | - | ||
| 109 | - | ### Wave 1 — auth/security cluster (8 tiny) | |
| 110 | - | ### Wave 2 — scanning (3) | |
| 111 | - | ### Wave 3 — DB layer polish (4) | |
| 112 | - | ### Wave 4 — storage handlers + admin rescan seal + downloads (5) | |
| 113 | - | ### Wave 5 — UX polish (2) | |
| 114 | - | ### Wave 6 — Performance (3 of 5; 2 deferred) | |
| 115 | - | - [ ] **DEFERRED** `build_runner.rs:151` serial-target loop. LOW; refactor touches denominator + error agg + log order. Post-launch. | |
| 116 | - | - [ ] **DEFERRED** `scheduler/mod.rs:92-279` advisory-lock granularity. Multi-replica concern; defer until multi-replica is real. | |
| 117 | - | ||
| 118 | - | ### Wave 7 — Payments LOW (2) | |
| 119 | - | --- | |
| 120 | - | ||
| 121 | - | ## Storage code-fuzz 2026-05-31 (post-Run #7) | |
| 122 | - | ||
| 123 | - | Targeted Storage-axis fuzz to verify A- before triggering full Run #8. | |
| 124 | - | ||
| 125 | - | ### Above-MED fixes that landed | |
| 126 | - | ### Remaining MED/LOW (below A- bar; defer or Phase 4 polish) | |
| 127 | - | - [ ] Storage MED — `update_project_image_url` / `update_item_cover` ignore `rows_affected()`. Same shape as H1 but only follow-on side-effect is `bump_cache_generation`, so blast radius is small. | |
| 128 | - | - [ ] Storage MED — `downloads.rs:120` `((duration as u64) * 2).max(3600)` with no DB CHECK on `duration_seconds`. Add `CHECK (duration_seconds >= 0)` migration + cap in code (`duration.max(0).saturating_mul(2).clamp(3600, 86400)`). | |
| 129 | - | - [ ] Storage MED — Admin rescan (`routes/admin/uploads.rs:347, 390`) bypasses `commit_upload` seal via direct `db::scan_jobs::enqueue`. Demote to `pub(crate)` and expose `commit_rescan(target, ...)`. | |
| 130 | - | - [ ] Storage MED — `enqueue_s3_orphan` single-policy doc overstates discipline; either tighten doc or migrate remaining direct `delete_object` cleanup sites. | |
| 131 | - | - [ ] Storage MED — `is_s3_key_live` doesn't enumerate project image URLs (no current bug; surface fragile). | |
| 132 | - | - [ ] Storage LOW — `scanning/worker.rs:251` inline UPDATE bypasses `db::scanning::update_media_file_scan_status` helper. | |
| 133 | - | - [ ] Storage LOW — wizard `save.rs:95` updates only `cover_image_url` (not s3_key/size). | |
| 134 | - | - [ ] Storage LOW — `pending_uploads::remove_pending_upload` deletes by s3_key alone (signature broader than needed). | |
| 135 | - | ||
| 136 | - | --- | |
| 137 | - | ||
| 138 | - | ## Ultra Fuzz 2026-05-31 (Runs #6, #7 + S1) | |
| 139 | - | ||
| 140 | - | ### Structural / chronic-disease fixes that landed | |
| 141 | - | ### Bug-level fixes that landed | |
| 142 | - | ### Deferred (with rationale) | |
| 143 | - | - [ ] Drop unused `completion_effects` table — schema-only cleanup; harmless empty table. | |
| 144 | - | ||
| 145 | - | ### Notes on remaining MED/LOW (per Run #7 axis reports) | |
| 146 | - | - Storage MED — admin rescan handlers (`routes/admin/uploads.rs:347, 390`) still call `enqueue_scan_for` indirectly via lower-level primitives; functional today but bypasses the chronic-disease seal. | |
| 147 | - | - Storage MED — `update_item_cover` / `update_project_image_url` don't check `rows_affected()`; an ownership-filter mismatch returns Ok(0 rows) silently. | |
| 148 | - | - Storage MED — worker inline media UPDATE at `scanning/worker.rs:251` should use the new `db::scanning::update_media_file_scan_status` helper. | |
| 149 | - | - Storage LOW — internal CLI confirm drops returned `FileScanStatus` (no `pending_review` surfacing). | |
| 150 | - | - Storage LOW — `main.rs:334` comment references now-private `enqueue_scan_for`. | |
| 151 | - | - UX MED — `parse_dollars_to_cents` rejects `"$5"` and `"1,000"` literally; could strip `$`/`,` for clipboard-paste UX. | |
| 152 | - | - UX MED — project wizard skips `validate_tier_price` ($1–$10k); API path enforces it. | |
| 153 | - | - UX LOW — `BundleItemIds.filter_map` silently drops malformed UUIDs. | |
| 154 | - | - Payments M1 — `compute_splits` should `.max(0)` per-member for defense vs legacy negative `split_percent` rows. | |
| 155 | - | - Payments NIT — extract `require_stripe_ready` helper; six near-identical 5-line blocks across checkout files. | |
| 156 | - | ||
| 157 | - | --- | |
| 158 | - | ||
| 159 | - | ## Ultra Fuzz 2026-05-30 (Run #5) | |
| 160 | - | ||
| 161 | - | ## Ultra Fuzz 2026-05-30 (Run #5) | |
| 162 | - | ||
| 163 | - | Full report: `docs/audit_review.md`. 3 CRITICAL, 14 HIGH/SERIOUS. Two-axis regressions (Payments B, Storage B-) are coverage expansion into previously-unaudited paths plus one chronic recurrence; Security improved to A-; all 27 Run #4 plan items verified closed. | |
| 164 | - | ||
| 165 | - | ### Phase 1 — CRITICAL (fix today) | |
| 166 | - | ||
| 167 | - | - [ ] **Storage CRIT — `uploads.rs` file-type gate ordering** — `routes/storage/uploads.rs:204-237`. Move the match-arm rejection of `Download`/`Insertion`/`MediaImage`/`MediaVideo` BEFORE `enqueue_scan_for` and `update_item_scan_status`. Then make `enqueue_scan_for` + `update_*_scan_status` `pub(crate)` and expose a `commit_upload(file_type, item_id, s3_key)` higher-level op used by all three handlers (uploads / versions / images). Closes Phase 5 chronic invariant-in-prose finding. | |
| 168 | - | - [ ] **UX CRIT — Field-aware validation reaches the UI** — `error.rs:216-264` + `templates/error.html`. Either add `fields: Vec<(String, String)>` to `ErrorTemplate` + per-input markup in templates, OR delete the `validation_fields*` API and migrate callers to `validation(summary)`. Audit `validation_fields` callsites and pick a path. | |
| 169 | - | - [ ] **Perf CRIT — `build_runner.rs` partial-failure denominator** — `build_runner.rs:175-180`. Track `failed_count`; report `succeeded/(succeeded+failed)`. Add a test with 3 targets / 2 failures asserting "1/3". | |
| 170 | - | ||
| 171 | - | ### Phase 2 — SERIOUS / HIGH (fix this weekend) | |
| 172 | - | ||
| 173 | - | - [ ] **Payments SERIOUS — NULL `item_id` refund decode bomb** — `db/transactions.rs:699-716`. Return `Vec<(TransactionId, Option<ItemId>)>`; skip `decrement_sales_count`/`revoke_keys_by_transaction` when None. Fixture test against a project-level transaction. | |
| 174 | - | - [ ] **Payments SERIOUS — `compute_splits` over-credit on members > 100%** — `routes/stripe/webhook/checkout_helpers.rs:240-269`. Reject `total_split_pct > 100` at the project_members write site (DB CHECK + validation). Defensively scale or clamp each split. Add test at 60%+60%. | |
| 175 | - | - [ ] **Payments SERIOUS — Tip `project_id` not validated vs recipient** — `routes/stripe/checkout/tips.rs:104-106`. After form accept, assert `project.user_id == recipient_id`; 400 otherwise. | |
| 176 | - | - [ ] **Payments SERIOUS — Cart bypasses item `listed` gate** — `db/cart.rs:94-123` + `get_cart_items` + `get_cart_items_for_seller`. Add `AND i.listed = true` to all three. Add per-seller checkout path check. Regression test: toggle unlisted item into cart → rejection. | |
| 177 | - | - [ ] **Payments SERIOUS — Unknown subscription status retry storm** — `routes/stripe/webhook/subscriptions.rs:117-121`. Replace `?` with a match: known statuses dispatch; unknown statuses `tracing::warn!` and return 200 OK so Stripe stops retrying. | |
| 178 | - | - [ ] **Payments SERIOUS — `is_full_refund` zero-amount** — `payments/webhooks.rs:294-308`. Predicate becomes `amount > 0 && amount_refunded >= amount`. Invert the test at line 517-525. | |
| 179 | - | - [ ] **Storage HIGH — `versions.rs` enqueue-before-idempotency** — `routes/storage/versions.rs:159-174`. Move `version.s3_key == req.s3_key` idempotency check before `enqueue_scan_for`. Apply Phase 1 `commit_upload` helper. | |
| 180 | - | - [ ] **Storage HIGH — `project_image_confirm` probe-failure + no rollback** — `routes/storage/images.rs:179-208`. On `Err`/`Ok(None)` from `s3.object_size`, fall back to recorded size. Move `enqueue_deletions` AFTER `update_project_image_url` success, or wrap in a tx. | |
| 181 | - | - [ ] **Storage HIGH — `media_confirm` non-atomic three-write** — `routes/storage/media.rs:236-293`. Wrap `try_increment_storage` → `remove_pending_upload` → `media_files::create` in a transaction. Refund storage credit on any failure. | |
| 182 | - | - [ ] **UX HIGH — Negative/zero prices via `PriceCents::from_db`** — `routes/pages/dashboard/wizards/item/save.rs:183-185, 214-227`. Use `PriceCents::new(price_cents)?` unconditionally; drop `> 0` guard. Add `min <= suggested` check on PWYW. | |
| 183 | - | - [ ] **UX HIGH — f64 price parsing accepts NaN/saturates** — same file + `routes/api/items/bulk.rs:136-139` + `routes/pages/dashboard/wizards/project.rs:264-298`. Parse as decimal cents (`rust_decimal::Decimal::from_str_exact`); reject NaN/Inf/out-of-range before cast. | |
| 184 | - | - [ ] **UX HIGH — Username live-check fails open on DB error** — `routes/auth.rs:356-361`. Propagate error or treat as "unavailable, try again". | |
| 185 | - | - [ ] **Perf HIGH — Cart checkout 80 sequential roundtrips** — `routes/stripe/checkout/cart.rs:68-248`. Bulk-load `has_purchased_item` with `WHERE item_id = ANY($1)`. Batch `get_item_by_id`. Claim free items in one tx with batched inserts. Target ≤ 5 roundtrips for any cart size. | |
| 186 | - | - [ ] **Perf HIGH — `record_view` unbounded spawn per request** — `db/page_views.rs:18-32`. Replace per-request spawn with `mpsc` channel + single background drainer flushing every 250ms via bulk UPSERT. | |
| 187 | - | - [ ] **Perf HIGH — `check_sales_count_drift` full-table aggregate** — `scheduler/integrity.rs:53-73`. Add `WHERE i.sales_count > 0 OR EXISTS(SELECT 1 FROM transactions WHERE item_id = i.id LIMIT 1)` short-term; long-term trigger-maintained counts. | |
| 188 | - | ||
| 189 | - | ### Phase 3 — MED (fix before Run #6 if cheap) | |
| 190 | - | ||
| 191 | - | - [ ] Storage: advisory-lock leak in `check_sandbox_cap` (`db/mod.rs:92-128`) → `pg_advisory_xact_lock` or RAII guard. | |
| 192 | - | - [ ] Storage: `is_s3_key_live` missing tables (`db/pending_s3_deletions.rs:67-82`). | |
| 193 | - | - [ ] Storage: `delete_version` owner SELECT outside tx + post-commit S3 enqueue (`db/versions.rs:267-315`). | |
| 194 | - | - [ ] Security: ClamAV `FailOpen` startup assertion (`scanning/clamav.rs:19` + `scanning/mod.rs:151-164`) — refuse boot if scan configured but no AV layer live. | |
| 195 | - | - [ ] Security: `helpers.rs:44-50` `DefaultHasher` → stable hasher (sha2 first 8 bytes or `xxh3` constant seed). | |
| 196 | - | - [ ] Security: OAuth `state` size cap (`routes/oauth.rs:379-386`) — reject `> 1024`; cap `code_challenge` at 44 chars. | |
| 197 | - | - [ ] Security: `extract_client_ip` non-Cloudflare fallback warning (`helpers.rs:33-40`). | |
| 198 | - | - [ ] UX: pagination offset overflow (`routes/pages/public/discover.rs:85-87`, `routes/admin/users.rs:37-39`). | |
| 199 | - | - [ ] UX: forms silently render without `_csrf` when handler forgets to populate token — make `csrf_token` non-optional in form-bearing templates. | |
| 200 | - | - [ ] UX: `validate_username` byte-length vs `chars().count()` (`routes/auth.rs:322`). | |
| 201 | - | - [ ] Perf: scheduler advisory-lock connection pinned across S3 (`scheduler/mod.rs:92-279`) → dedicated `max_connections(1)` pool. | |
| 202 | - | - [ ] Perf: cleanup S3 deletes serialized inside scheduler tick (`scheduler/cleanup.rs:77-100`) → `for_each_concurrent(8, ...)`. | |
| 203 | - | ||
| 204 | - | ### Phase 4 — Polish (after Run #6 confirms ≥ A-) | |
| 205 | - | ||
| 206 | - | - [ ] Payments: `has_active_subscription_to_item` period-end clause mirroring (`db/subscriptions.rs:464-470`). | |
| 207 | - | - [ ] Payments: `get_active_creator_tier` + `sync_user_creator_tier` period-end defense (`db/creator_tiers.rs:91-103, 181-194`). | |
| 208 | - | - [ ] Payments: `release_use_count` race messaging (`db/promo_codes.rs:184-200`). | |
| 209 | - | - [ ] Payments: License key `activation_count` recount on revoke (`db/license_keys.rs:343-382`). | |
| 210 | - | - [ ] Payments: Subscription minimum-charge check (`payments/checkout.rs:283-317`). | |
| 211 | - | - [ ] Payments: Webhook v1/v2 unmark-on-failure parity (`routes/stripe/webhook/mod.rs:48-86`). | |
| 212 | - | - [ ] Storage: `media_files.list_folders` scan filter (`db/media_files.rs:73-82`). | |
| 213 | - | - [ ] Storage: `pending_uploads.record_pending_upload` silent user-mismatch (`db/pending_uploads.rs:23-33`). | |
| 214 | - | - [ ] Storage: `append_log_bounded` non-atomic size cap (`build_runner.rs:516-534`). | |
| 215 | - | - [ ] Storage: `downloads.rs:119-122` presigned-URL expiry — cap `duration_seconds` + DB CHECK ≥ 0. | |
| 216 | - | - [ ] Security: `validate_token_consuming` for OAuth POST (`routes/oauth.rs:206`). | |
| 217 | - | - [ ] Security: `parse_repo_path` rejects lone-dot entries (`git_ssh.rs:162`). | |
| 218 | - | - [ ] Security: ClamAV INSTREAM 16K cap → fail-closed on truncation (`scanning/clamav.rs:101-108`). | |
| 219 | - | - [ ] Security: TOTP seeds at rest behind an application-level key. Currently unencrypted in the DB; `tech/security.md:42-53` already discloses this and commits to a fix. A database-only compromise yields working second factors today. | |
| 220 | - | - [ ] AI disclosure: render the tier badge on `pages/item.html` + project page (`> [!UI] ai-tier-badges` in `about/generative-ai.md` is unfilled). Show the `ai_disclosure` text for Assisted items above the buy button so fans see it before purchase. Same badge on item cards in Discover results / search hits. | |
| 221 | - | - [ ] AI disclosure: pick a shape for the Discover filter — current buckets are "All / Handmade / Assisted / Generated"; `about/generative-ai.md` § "How Fans Use This" promises "Handmade only / Human-led / Everything" (Human-led = Handmade ∪ Assisted). Either rewrite the policy to match buckets, or add the combined filter. | |
| 222 | - | - [ ] AI disclosure: community report endpoint for misclassified items. The policy commits to fan flagging ("Fans and fellow creators can flag items they believe are misclassified.") but there's no `/report` or `/flag` route. | |
| 223 | - | - [ ] AI disclosure: drop the `checked` default on the publish wizard's tier radios so the creator has to pick deliberately, OR rephrase the policy's "no unlabeled option" to acknowledge default-handmade. Minor; signal-of-intent only. | |
| 224 | - | - [ ] UX: validation error messages stop reflecting user input (`wizards/item/mod.rs:176-179`). | |
| 225 | - | - [ ] UX: CSRF body extraction stops using `from_utf8_lossy` (`csrf.rs:528-543`). | |
| 226 | - | - [ ] Perf: scan-pipeline 400 MiB worst-case capacity note (`constants.rs:156-157`). | |
| 227 | - | - [ ] Perf: announcement fan-out persistence + resume (`scheduler/announcements.rs:59-89, 147-177`). | |
| 228 | - | - [ ] Perf: build log per-line DB roundtrip (`build_runner.rs:516-534`). | |
| 229 | - | ||
| 230 | - | ### Phase 5 — Chronic | |
| 231 | - | ||
| 232 | - | - [ ] **Invariant-in-prose, FOURTH consecutive run.** Phase 1 #1 (constructive `commit_upload` helper sealing the lower-level scan/credit/status ops) is the only acceptable resolution. After it lands, audit `compute_splits` (Payments) and `ErrorTemplate` (UX) for the same shape and apply the same treatment. | |
| 233 | - | ||
| 234 | - | --- | |
| 235 | - | ||
| 236 | - | ## Ultra Fuzz 2026-05-26 (Run #4) | |
| 237 | - | ||
| 238 | - | Full report: `docs/audit_review.md`. Plan target: lift every axis back to A- or higher (Payments A · Storage A · UX A- · Security A- · Performance A-). | |
| 239 | - | ||
| 240 | - | ### Phase 1 — clear HIGH/CRITICAL caps (must do before launch) | |
| 241 | - | ||
| 242 | - | ### Phase 2 — close axis-dragging SERIOUS items | |
| 243 | - | ||
| 244 | - | ### Phase 3 — resilience & infra hardening | |
| 245 | - | ||
| 246 | - | ### Phase 4 — polish | |
| 247 | - | ||
| 248 | - | ### Phase 5 — chronic | |
| 249 | - | ||
| 250 | - | - [~] **Invariant-in-prose / policy-not-in-types — third consecutive run (CHRONIC)** — scan_status-ordering half closed 2026-05-26 (see Phase 1 entry for `images.rs::item_image_confirm`). The constructive-impossibility shape from the chronic-remediation rubric: `commit_*_upload` is the only handler-reachable path that writes both row + scan_status; the lower-level scan_status writes were renamed `set_*_scan_status_standalone` and documented as worker- and admin-override-only. Compiler-driven migration found one additional handler with the same bug (CLI internal upload) — that's the test the rubric wants: structural change exposes drift, not human review. Remaining: `/stripe/*` CSRF policy patchwork — same disease, different organ. Track as Landing 2 below. | |
| 251 | - | ||
| 252 | - | Follow-ups: | |
| 253 | - | - [ ] **Manual-posture runtime assertion (dev builds).** Today `*_csrf_manual` requires no compile-time proof that the handler called `validate_token_consuming`. Only the tip handler is Manual, and `_validated` is bound only as documentation. In dev/test builds, set a flag in `validate_token_consuming` and debug-assert it after the handler runs; mismatched routes panic loudly in CI without affecting prod. Not blocking — only matters if Manual grows beyond one route. | |
| 254 | - | - [ ] **Phase 1 entries still open:** `cancel_pending_item_checkout` Skip reason is `"Phase 1 todo: tighten to post_csrf"` (grep "Phase 1 todo" to find). `/login` and `creator-tier` template tightening tracked separately above. | |
| 255 | - | ||
| 256 | - | ### Notes & non-actions | |
| 257 | - | ||
| 258 | - | - Status-notification fan-out cooldown across overlapping tasks (`monitor.rs:213-237`) — single-replica today; harmless. Reconsider when adding a second instance. | |
| 259 | - | - `record_storage_fill_stats` JOIN (`metrics.rs:181-218`) — 5min cadence is acceptable at 100k users; revisit at 1M. | |
| 260 | - | - `metadefender` could run concurrently with MalwareBazaar in suspicion path (`scanning/mod.rs:377-398`) — micro-optimization, deferred. | |
| 261 | - | - `populate_known_sync_apps` startup-only (`rate_limit.rs:65-85`) — paired with the deletion-path item above; together they're a single fix. | |
| 262 | - | ||
| 263 | - | --- | |
| 264 | - | ||
| 265 | - | ## Ultra Fuzz 2026-05-26 (Run #3) | |
| 266 | - | ||
| 267 | - | Full report: `docs/audit_review.md`. Plan target: lift every axis to A- or higher (Payments A · Storage A- · UX A · Security A- · Performance A-). | |
| 268 | - | ||
| 269 | - | ### Notes & non-actions | |
| 270 | - | ||
| 271 | - | - Backup-code fast-path malformed-hash trap (`db/totp.rs:155-189`) — log + alert + fall through to legacy path; small, file as polish. | |
| 272 | - | - `session_cache` TTL window vs admin revoke (`auth.rs:154-191`) — documented as intentional; consider exposing a broadcast invalidate op if operator demand emerges. | |
| 273 | - | - `monitor.rs` `pg_stat_activity` cadence already covered by Phase 2 split. | |
| 274 | - | ||
| 275 | - | --- | |
| 276 | - | ||
| 277 | - | ## Ultra Fuzz 2026-05-25 (Run #2) | |
| 278 | - | ||
| 279 | - | Full report: `docs/audit_review.md`. | |
| 280 | - | ||
| 281 | - | ### Outbox follow-ups — convert remaining webhook handlers | |
| 282 | - | ||
| 283 | - | All five remaining handlers converted to outbox 2026-05-25; migration 125 added `fan_plus_subscription_id` and `creator_subscription_id` parents so each subscription type has its own idempotency anchor. | |
| 284 | - | ||
| 285 | - | ### Current phase — serious / high | |
| 286 | - | ||
| 287 | - | - [ ] **Login template field-aware errors** — deferred 2026-05-26. Re-scoped: error-construction infra (`AppError::validation_fields`) is in place in `join_wizard::step_account_create`, but neither signup nor login renders per-field highlights yet. Real work is a new HTMX partial with OOB swaps per input + per-field error containers on both templates. Login itself has only one safe per-field message by design (creds are intentionally generic to avoid enumeration); the value is mostly on the signup side. | |
| 288 | - | - [~] **Scanning peak memory** — `scanning/mod.rs:174` already uses `std::sync::Arc::<[u8]>::from(data)` which dispatches through `Vec::into_boxed_slice` and reuses the allocation; SHA-256 streams via `Sha256::update` over the same Arc-shared buffer. No change needed. | |
| 289 | - | - [~] **`check_sales_count_drift` full GROUP BY** — the SQL already filters via `HAVING i.sales_count != COUNT(t.id)` (the real bound). The trailing `LIMIT 50` is a per-tick cap on how many drifts to surface, not a cosmetic post-group filter. No action. | |
| 290 | - | ||
| 291 | - | ### Current phase — medium / minor | |
| 292 | - | ||
| 293 | - | - [~] **`pg_stat_activity` baseline load** — `monitor.rs:290-294` doc explicitly justifies the 30 s cadence for operator-dashboard refresh; no change. | |
| 294 | - | ||
| 295 | - | ### Deferred — architectural | |
| 296 | - | ||
| 297 | - | - [ ] **Cloudflare-only origin: migrate custom domains to CF for SaaS, then firewall 80/443.** Re-scoped 2026-05-26. The original sketch (firewall the origin to CF IP ranges) conflicts with the shipped custom-domain feature (`api/domains.rs` + Caddy `on_demand_tls`), which expects creators' A-records to hit the origin directly. The two threats the firewall was meant to close are already mitigated at layer 7 — `CloudflareIpKeyExtractor` peer-IP fallback (landed 2026-05-26) closes the CF-Connecting-IP spoofing surface; Caddy `client_auth require_and_verify` closes the WAF-bypass surface for `makenot.work`. The proper sequencing is now (1) upgrade to CF Business for CF-for-SaaS, (2) reconfigure CF dashboard with a fallback origin, (3) update `api/domains.rs` onboarding to CNAME instead of A-record, (4) migrate the 1 live custom-domain creator, (5) drop `on_demand_tls` from `Caddyfile`, (6) apply the firewall ACL. Full sequence + ACL sketch + gotchas live in `_meta/docs/incident_response.md` § "Pending Hardening: Cloudflare-only origin firewall". Blocked on the CF plan upgrade + 1 customer email, neither of which happens in-session. | |
| 298 | - | ||
| 299 | - | --- | |
| 300 | - | ||
| 301 | - | ## Ultra Fuzz 2026-05-24 (Run #1) | |
| 302 | - | ||
| 303 | - | Full report: `docs/audit_review.md`. | |
| 304 | - | ||
| 305 | - | ### Current phase — medium | |
| 306 | - | ||
| 307 | - | - [~] **Status notification parallel fan-out** — kept sequential with 100 ms shaper; that pacing is intentional (SMTP rate-limit shape). No change. | |
| 308 | - | ||
| 309 | - | ### Deferred — architectural | |
| 310 | - | ||
| 311 | - | All four Run #1 deferred items closed by Run #2 sweeps; pointers below. | |
| 312 | - | ||
| 313 | - | --- | |
| 314 | - | ||
| 315 | - | ## Creator applications restructure (replaces waitlist) | |
| 316 | - | ||
| 317 | - | Discussed and scoped 2026-06-03; no implementation yet. Rename and generalize the existing waitlist into a creator-applications system that lives inside the join wizard, replaces the standalone `/admin/waitlist` surface, and gives fans a settings-page path to apply after the fact. The trigger to start: when the founder cohort fill is no longer well-served by the current waitlist UX, or before opening signup beyond hand-picked invitations — whichever comes first. | |
| 318 | - | ||
| 319 | - | ### Model | |
| 320 | - | ||
| 321 | - | Three branches in the wizard, decided up front by the signing-up account: | |
| 322 | - | ||
| 323 | - | - **Free trial** — short pitch (1–2 sentences: what you make, why MNW). Account exists, `can_create_projects = false`, application status `pending`. Operator approval flips it. | |
| 324 | - | - **Benefits account** — longer disclosure (community / mission alignment, the binding-mission-statement framing from the program docs). Same `pending` state, different `application_type` so the admin queue can sort them. | |
| 325 | - | - **Just pay** — skip the application entirely, route to Stripe checkout. **No approval required** — paying is the signal. On subscription activation, `can_create_projects = true` immediately. No `creator_applications` row written. | |
| 326 | - | ||
| 327 | - | Founder rate (50% off, locked for life when window closes) is available on **any** branch during the cohort window — free-trial seats get $0, benefits seats may be subsidized, paid seats get the standard founder discount. | |
| 328 | - | ||
| 329 | - | ### Schema migration | |
| 330 | - | ||
| 331 | - | - [ ] Replace `creator_waitlist` with `creator_applications`. Either rename the table + add columns, or create a sibling and migrate rows. Add `application_type` enum column (`free_trial` | `benefits_account`). Normalize `status` values to `pending` | `approved` | `declined` | `spam`. | |
| 332 | - | - [ ] Backfill existing waitlist rows as `application_type = 'free_trial'` (preserve `pitch`, `created_at`, decision metadata, `selection_method`, `invited_by_user_id`). | |
| 333 | - | - [ ] Drop the `db::waitlist` module + its consumers once nothing references it. The `grant_creator_access` helper is the right primitive to keep — move it under `db::creator_applications`. | |
| 334 | - | ||
| 335 | - | ### Wizard | |
| 336 | - | ||
| 337 | - | - [ ] Insert a new "Choose your entry" step in `join_wizard.rs` flow after profile, before pitch. Three radio options + short descriptions. The chosen branch threads through to the next step. | |
| 338 | - | - [ ] Rebuild `wizards/steps/join/pitch.html` to branch on `application_type` — different prompt text, different length limits (free-trial short, benefits longer). | |
| 339 | - | - [ ] Route the paid branch around the application step entirely: profile → Stripe checkout → complete. On webhook activation, no `creator_applications` row is written; `can_create_projects` is granted on `creator_subscriptions.status = 'active'`. | |
| 340 | - | - [ ] Rewrite `wizards/steps/join/complete.html` so the "Apply for creator access" framing is gone (the question was already answered upstream). Free-trial / benefits accounts see "Application under review"; paid accounts see "Welcome — create your first project." | |
| 341 | - | ||
| 342 | - | ### Dashboard / settings | |
| 343 | - | ||
| 344 | - | - [ ] New `/settings/creator-access` page for fan-only accounts to submit an application after the fact. Same three branches, same pitch requirements. Lives in the dashboard tab rail, not a marketing page. | |
| 345 | - | - [ ] Strip the five existing "Apply for Creator Access" CTAs (`partials/tabs/user_projects.html` x2, `partials/tabs/user_creator.html` heading, `pages/creators.html` step list, `wizards/steps/join/complete.html` card). Replace dashboard surfaces with a small "Apply for creator access" link that routes to `/settings/creator-access`. Marketing page (`creators.html`) drops the "apply from your dashboard" framing in favor of "start your free trial during signup." | |
| 346 | - | ||
| 347 | - | ### Pending UX | |
| 348 | - | ||
| 349 | - | - [ ] Accounts in `pending` status can browse, buy items as a fan, manage profile and settings — they just can't reach creator dashboards. Existing `can_create_projects` guards already block project creation; new behavior is to render an "Application under review" panel (with submitted pitch + submission date) instead of returning 404 or redirecting away. | |
| 350 | - | - [ ] Email notification on approve / decline, distinct templates per `application_type`. Decline template names the reason; approval template points at the dashboard. | |
| 351 | - | ||
| 352 | - | ### Admin | |
| 353 | - | ||
| 354 | - | - [ ] Rename `routes/admin/waitlist.rs` → `routes/admin/applications.rs`. Generalize the approve / decline / spam handlers to read `application_type`. The `grant_creator_access` call on approve stays as-is. | |
| 355 | - | - [ ] Rename `dashboards/admin-waitlist.html` → `dashboards/admin-applications.html`. Add an `application_type` column and a type filter (free_trial / benefits_account / all). The existing stats block (pending / approved / spam / total_creators counts) stays; queries adjust to read the new table. | |
| 356 | - | - [ ] Update admin navigation (`admin_active_page: "waitlist"` → `"applications"`) and any cross-links in the admin shell. | |
| 357 | - | - [ ] Sitemap entry update, breadcrumb update. | |
| 358 | - | ||
| 359 | - | ### Tests + acceptance | |
| 360 | - | ||
| 361 | - | - [ ] Pin: a `pending` account cannot create projects (the existing `can_create_projects` guard already enforces this; verify the rendered panel works). | |
| 362 | - | - [ ] Pin: a "just pay" signup lands at `can_create_projects = true` AND an active Stripe subscription AND **no** `creator_applications` row. | |
| 363 | - | - [ ] Pin: admin queue lists pending applications sorted by `application_type` then `created_at`. | |
| 364 | - | - [ ] Pin: every removed "Apply for Creator Access" string is gone from `templates/` (grep test in `tests/regression/`). | |
| 365 | - | - [ ] Pin: founder rate applies regardless of branch during the cohort window (existing founder-pricing logic; new test asserts the multiplier doesn't depend on `application_type`). | |
| 366 | - | ||
| 367 | - | ### Out of scope (this restructure) | |
| 368 | - | ||
| 369 | - | - Partnership / sponsorship / residency / fellow-led-project applications — those continue via email, no form surface built. If we ever build them, the same `creator_applications` table can host them as additional `application_type` variants. | |
| 370 | - | ||
| 371 | - | ## PoM contract guard (landed 2026-05-25) | |
| 372 | - | ||
| 373 | - | Schema-drift guard test wired against `shared/pom-contract/`: `src/routes/pages/public/health/mod.rs::tests::pom_hetzner_health_expectations_resolve`. `health_json` body builder extracted as pure `health_json_body(overall, db_ok)` for the test. Catches the v0.5.16-class drift where a field is removed from `/api/health` without updating PoM's expectations. See `MNW/CLAUDE.md` § PoM Health Contract. |
| @@ -1,31 +0,0 @@ | |||
| 1 | - | # theme-common TODO | |
| 2 | - | ||
| 3 | - | ## Active | |
| 4 | - | ||
| 5 | - | (none — crate is stable, consumed by audiofiles + goingson) | |
| 6 | - | ||
| 7 | - | ## Future — Unified theme library + per-user MNW theming | |
| 8 | - | ||
| 9 | - | **Scope:** One canonical theme library serving every app under Make Creative (audiofiles, goingson, balanced_breakfast, MNW server-rendered UI), plus a Fan+ perk that lets MNW users override the platform default CSS per-account. | |
| 10 | - | ||
| 11 | - | **Why:** | |
| 12 | - | - Today every app maintains its own theme folder: `Apps/audiofiles/crates/audiofiles-browser/themes/` (28 themes), `Apps/goingson/src-tauri/frontend/themes/helix/` (9 themes), `MNW/shared/themes/` (28 themes). Drift is inevitable; theme counts are already wrong in 5+ docs. | |
| 13 | - | - A unified library shipped from `MNW/shared/themes/` (or a successor crate here in `theme-common/`) would centralize the source of truth and let documentation point at one directory. | |
| 14 | - | - Theming the MNW default CSS per user is a natural Fan+ benefit: pay tier gets a theme picker + custom CSS slot stored against the account, applied via `<link rel="stylesheet">` injected into every authenticated page render. | |
| 15 | - | ||
| 16 | - | **Pieces to design:** | |
| 17 | - | - Single canonical theme TOML schema (audit existing helix-format vs MNW theme TOMLs for shape divergence). | |
| 18 | - | - A loader contract every app already supports (theme-common already does this for native apps; MNW server-side rendering needs an equivalent for CSS variable injection into Askama templates). | |
| 19 | - | - Per-user storage: new column on the users table (or a separate `user_themes` table) holding either a theme ID + custom CSS overrides, or a full custom theme blob (with size cap and validation). | |
| 20 | - | - Fan+ gating: theme picker visible to all signed-in users (apply a built-in theme); custom CSS slot gated behind Fan+ status. | |
| 21 | - | - CSS sanitization for the custom-CSS slot — accept only declarations, no `@import`, no `url()` to off-origin, no `expression()` (defunct but defensive). Probably easier to whitelist a CSS property allowlist than to sanitize freeform. | |
| 22 | - | - Migration path: drop hard theme counts from app READMEs in favor of "see the themes directory" (already done for the launch). | |
| 23 | - | ||
| 24 | - | **Not Monday work.** Surfaced here so it's tracked. The Monday docs drop hard counts and point at directories — that posture is already correct for whenever this lands. | |
| 25 | - | ||
| 26 | - | **Key paths:** | |
| 27 | - | - `MNW/shared/theme-common/` (this crate — likely host for the canonical loader) | |
| 28 | - | - `MNW/shared/themes/` (current cross-app theme store) | |
| 29 | - | - `Apps/audiofiles/crates/audiofiles-browser/themes/` (consumer; would migrate) | |
| 30 | - | - `Apps/goingson/src-tauri/frontend/themes/helix/` (consumer; would migrate) | |
| 31 | - | - `MNW/server/templates/` (would gain user-CSS injection hook) |