max / makenotwork
1 file changed,
+500 insertions,
-0 deletions
| @@ -0,0 +1,633 @@ | |||
| 1 | + | # Scan Pipeline Audit and Redesign | |
| 2 | + | ||
| 3 | + | Status: Draft, 2026-05-24. Author: Max + Claude. Not yet implemented. | |
| 4 | + | ||
| 5 | + | This document is the design predicate for a full redesign of MNW's file-scanning | |
| 6 | + | pipeline. It is the result of discovering that every upload on the platform since | |
| 7 | + | 2026-05-10 has been silently held at `held_for_review` because the | |
| 8 | + | MalwareBazaar layer fails closed on a missing API key — and there was no | |
| 9 | + | admin-visible signal that anything was wrong. | |
| 10 | + | ||
| 11 | + | The shipped pipeline is structurally fragile in ways that are not unique to that | |
| 12 | + | one bug. This document re-derives what the pipeline should look like, picks the | |
| 13 | + | layer set, fixes the fail-closed-by-default policy, moves scanning off the | |
| 14 | + | upload critical path, and specifies an admin surface that would have caught the | |
| 15 | + | MalwareBazaar regression on day one. | |
| 16 | + | ||
| 17 | + | No code changes accompany this document. Implementation is sequenced separately | |
| 18 | + | once the design is reviewed. | |
| 19 | + | ||
| 20 | + | --- | |
| 21 | + | ||
| 22 | + | ## 1. Goals and non-goals | |
| 23 | + | ||
| 24 | + | **Goals** | |
| 25 | + | ||
| 26 | + | - A scan pipeline a one-person platform can defend in public and point to as a | |
| 27 | + | trust differentiator. | |
| 28 | + | - Multi-signal detection that survives the loss of any single layer. | |
| 29 | + | - No silent platform-wide outage when a third-party scan endpoint changes | |
| 30 | + | behavior. | |
| 31 | + | - Honest, transparent communication to creators and downloaders about what is | |
| 32 | + | scanned and what is found. | |
| 33 | + | - Zero per-call vendor lock-in to a hyperscaler-owned threat-intel platform. | |
| 34 | + | ||
| 35 | + | **Non-goals** | |
| 36 | + | ||
| 37 | + | - Best-in-class detection of nation-state malware. That bar belongs to enterprise | |
| 38 | + | SOCs with seven-figure budgets. We aim for "covers ~90% of real-world malware | |
| 39 | + | on a creator-uploaded distribution platform" — explicitly bounded. | |
| 40 | + | - Real-time dynamic-analysis sandboxing for every upload. Cost-prohibitive at | |
| 41 | + | this stage; reserved for manual review of flagged samples. | |
| 42 | + | - Detection of supply-chain compromise inside dependencies of an uploaded | |
| 43 | + | binary. Out of scope for a file scanner; addressed separately by | |
| 44 | + | reproducible-build verification and creator-attested provenance, future work. | |
| 45 | + | ||
| 46 | + | ## 2. Threat model | |
| 47 | + | ||
| 48 | + | Adversaries, in priority order of damage to the platform: | |
| 49 | + | ||
| 50 | + | 1. **Compromised creator account pushing a poisoned update.** | |
| 51 | + | A long-standing creator's account is taken over. Attacker uploads a new | |
| 52 | + | version of an existing app with malware bundled in. Existing customers | |
| 53 | + | auto-update or download a "trusted" creator's release. Highest reputational | |
| 54 | + | damage because creator trust is the platform's load-bearing asset. | |
| 55 | + | 2. **Malicious-actor signup uploading malware as a free or cheap "tool".** | |
| 56 | + | Attacker creates a fresh account, lists a fake app, hopes a few downloads | |
| 57 | + | happen before takedown. Lower individual blast radius but easier to attempt | |
| 58 | + | at volume. | |
| 59 | + | 3. **Legitimate creator uploads benign software that triggers a false positive.** | |
| 60 | + | Crypto wallets, system utilities, low-level audio tools, and AppImages all | |
| 61 | + | routinely false-positive on signature AVs. Creator-facing UX must handle | |
| 62 | + | this gracefully — visible status, clear appeal path, no silent quarantine. | |
| 63 | + | 4. **Account-takeover update path** (subset of (1)): even if the new binary is | |
| 64 | + | signed with the creator's existing Apple Dev ID or Authenticode cert, an | |
| 65 | + | attacker with cert access can re-sign. Defense relies on out-of-band | |
| 66 | + | signals: new-device login, IP reputation, version-velocity anomalies, | |
| 67 | + | creator email confirmation on new releases. | |
| 68 | + | 5. **Novel zero-day** that no engine in our stack recognizes. | |
| 69 | + | Unavoidable in the general case. Mitigation: hash-reputation comparison | |
| 70 | + | over time (re-scan), public scan-result transparency so other downloaders | |
| 71 | + | can flag, and an admin review queue that surfaces anything the auto-stack | |
| 72 | + | can't classify. | |
| 73 | + | 6. **Embedded malicious URLs in otherwise benign documentation, license files, | |
| 74 | + | or app text.** Lower priority but cheap to defend via URL reputation | |
| 75 | + | lookups on extracted strings. | |
| 76 | + | ||
| 77 | + | The dominant cases are (1) and (2). (3) is the dominant *user-experience* | |
| 78 | + | problem we'll cause on ourselves if we get fail-closed policy wrong. | |
| 79 | + | ||
| 80 | + | ## 3. Current pipeline | |
| 81 | + | ||
| 82 | + | ### 3.1 Layers as implemented | |
| 83 | + | ||
| 84 | + | `MNW/server/src/scanning/` runs six layers per upload (`scanning/mod.rs`): | |
| 85 | + | ||
| 86 | + | | # | Layer | Implementation | Verdict on absent config | | |
| 87 | + | |---|-------|----------------|--------------------------| | |
| 88 | + | | 1 | `content_type` | Magic-byte sniffing of file header | Pass / Fail | | |
| 89 | + | | 2 | `structural` | Format-specific parser (PE, ELF, Mach-O, etc.) | Pass / Skip | | |
| 90 | + | | 3 | `archive` | ZIP / tar walk for nested malware | Pass / Skip | | |
| 91 | + | | 4 | `yara` | `yara-x` rule engine | Skip if no rules loaded | | |
| 92 | + | | 5 | `clamav` | `clamd` socket over `INSTREAM` | Skip if no socket configured | | |
| 93 | + | | 6 | `malwarebazaar` | abuse.ch hash lookup HTTP API | **Error if API shape unexpected** | | |
| 94 | + | ||
| 95 | + | Final disposition (`scanning/mod.rs::ScanPipeline::scan`): | |
| 96 | + | ||
| 97 | + | - Any layer `Fail` → `Quarantined` | |
| 98 | + | - Any layer `Error` → `HeldForReview` (fail closed) | |
| 99 | + | - Otherwise → `Clean` | |
| 100 | + | ||
| 101 | + | ### 3.2 Gaps observed | |
| 102 | + | ||
| 103 | + | | Gap | Impact | | |
| 104 | + | |-----|--------| | |
| 105 | + | | ClamAV daemon not installed on prod | Layer always `Skip`; baseline AV signal absent | | |
| 106 | + | | YARA rules directory empty on prod | Layer always `Skip`; no custom signatures | | |
| 107 | + | | MalwareBazaar response shape changed (now requires `Auth-Key` header) | Layer returns `Error` on every call; pipeline returns `HeldForReview` on every upload | | |
| 108 | + | | No code-signing verification (Apple notarization, Authenticode, AppImage GPG) | Largest available *positive* trust signal entirely unused | | |
| 109 | + | | Synchronous scan on upload request handler | Slow third-party API stalls the upload thread; one stuck third party blocks every concurrent upload | | |
| 110 | + | | Fail-closed-by-default for `Error` verdicts on optional layers | Optional best-effort layers can take down the whole pipeline (this is exactly what happened) | | |
| 111 | + | | No admin surface for layer-health monitoring | Two-week silent regression with no alert; only noticed when downloads broke | | |
| 112 | + | | No admin queue UI beyond a single "held items" list | No bulk re-scan, no per-layer detail, no history, no audit log | | |
| 113 | + | | No rescan capability | Held files can't be re-evaluated after a layer is fixed; only path is `UPDATE versions SET scan_status='clean'` | | |
| 114 | + | | Scan results stored per-`s3_key` not per-`version_id` | Detail joins go through `s3_key`, breaks if a file is referenced by multiple versions or moved | | |
| 115 | + | ||
| 116 | + | ### 3.3 Gap that triggered this audit | |
| 117 | + | ||
| 118 | + | Every upload since 2026-05-10 sits at `held_for_review`. Each `file_scan_results.scan_layers` row shows the same final entry: | |
| 119 | + | ||
| 120 | + | ``` | |
| 121 | + | {"layer":"malwarebazaar","verdict":"error","detail":"Unexpected query_status: unknown"} | |
| 122 | + | ``` | |
| 123 | + | ||
| 124 | + | The MalwareBazaar `get_info` endpoint changed: unauthenticated requests no | |
| 125 | + | longer return a `query_status` field. Our parser defaults the missing field to | |
| 126 | + | the literal string `"unknown"`, which falls through the match arm and returns | |
| 127 | + | `Error`. The fail-closed policy then converts that into `HeldForReview` for | |
| 128 | + | every upload. | |
| 129 | + | ||
| 130 | + | No alert fired. The only signal was a user trying to download a GO build and | |
| 131 | + | getting a generic "Failed to get download URL" toast. | |
| 132 | + | ||
| 133 | + | ## 4. Target architecture | |
| 134 | + | ||
| 135 | + | Three structural shifts: | |
| 136 | + | ||
| 137 | + | 1. **Async scan, sync upload.** Upload returns immediately with `Pending` | |
| 138 | + | status. Scan runs in a background worker. Status flips when scan completes. | |
| 139 | + | Downloads gate on terminal status (`Clean`, `Quarantined`). | |
| 140 | + | 2. **Explicit per-layer fail policy.** Each layer declares at registration | |
| 141 | + | whether `Error` is fail-open (Skip-equivalent) or fail-closed | |
| 142 | + | (`HeldForReview`). No global has-error switch. | |
| 143 | + | 3. **Multi-signal detection with positive trust signals.** Code-signing and | |
| 144 | + | notarization checks contribute *evidence of trust*, not just absence of | |
| 145 | + | threats. A properly notarized macOS binary from a verified Dev ID team is | |
| 146 | + | strong positive evidence; an unsigned `.exe` is weaker baseline. Today | |
| 147 | + | neither signal exists. | |
| 148 | + | ||
| 149 | + | ### 4.1 Layer set (post-audit) | |
| 150 | + | ||
| 151 | + | | # | Layer | Source | Fail-open or fail-closed on layer error | | |
| 152 | + | |---|-------|--------|------------------------------------------| | |
| 153 | + | | 1 | `content_type` | Magic-byte sniffing, in-process | fail-closed (cheap, deterministic) | | |
| 154 | + | | 2 | `structural` | Format parsers, in-process | fail-closed (cheap, deterministic) | | |
| 155 | + | | 3 | `archive` | Nested-archive walk, in-process | fail-closed (cheap, deterministic) | | |
| 156 | + | | 4 | `yara` | yara-x + Florian Roth `signature-base` ruleset | fail-closed (in-process, deterministic) | | |
| 157 | + | | 5 | `clamav` | `clamd` daemon + `freshclam` cron | fail-open (network-dependent local service) | | |
| 158 | + | | 6 | `signing_macos` | `codesign --verify --deep --strict` + `spctl --assess --type exec` + notarization staple check | fail-open on macOS-only files; positive evidence if pass | | |
| 159 | + | | 7 | `signing_windows` | `signtool verify /pa` + cert chain inspection | fail-open on Windows-only files; positive evidence if pass | | |
| 160 | + | | 8 | `signing_linux` | AppImage GPG signature + zsync URL presence; deb/rpm signatures | fail-open; positive evidence if pass | | |
| 161 | + | | 9 | `abuse_malwarebazaar` | abuse.ch hash lookup with `Auth-Key` header | fail-open (third-party network) | | |
| 162 | + | | 10 | `abuse_urlhaus` | URL reputation on strings extracted from binary | fail-open (third-party network) | | |
| 163 | + | | 11 | `metadefender_cloud` | OPSWAT free tier (40/day), **second-opinion only** on YARA/ClamAV flags | fail-open (rate-limited, optional) | | |
| 164 | + | | 12 | `hybrid_analysis` | CrowdStrike Falcon Sandbox free key (30/month), **admin-triggered only** | manual, not on the auto-path | | |
| 165 | + | ||
| 166 | + | **Policy:** layers 1-4 are deterministic in-process work and fail closed | |
| 167 | + | because an internal bug producing `Error` is a code defect, not an outage. | |
| 168 | + | Layers 5+ are network or local-service dependencies and fail open because | |
| 169 | + | external regressions must never take down the platform — they degrade signal | |
| 170 | + | quality, not availability. | |
| 171 | + | ||
| 172 | + | ### 4.2 Status machine | |
| 173 | + | ||
| 174 | + | ``` | |
| 175 | + | ┌──────────────────┐ | |
| 176 | + | upload accepted ───▶ Pending │ | |
| 177 | + | └────────┬─────────┘ | |
| 178 | + | │ scan worker picks up | |
| 179 | + | ┌────────▼─────────┐ | |
| 180 | + | │ Scanning │ | |
| 181 | + | └────────┬─────────┘ | |
| 182 | + | ┌──────────────┼──────────────┐ | |
| 183 | + | ▼ ▼ ▼ | |
| 184 | + | ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ | |
| 185 | + | │ Clean │ │HeldForReview│ │ Quarantined │ | |
| 186 | + | └─────────────┘ └──────┬──────┘ └─────────────┘ | |
| 187 | + | │ admin Promote | |
| 188 | + | ▼ | |
| 189 | + | ┌─────────────┐ | |
| 190 | + | │ Clean │ | |
| 191 | + | └─────────────┘ | |
| 192 | + | │ admin Quarantine | |
| 193 | + | ▼ | |
| 194 | + | ┌─────────────┐ | |
| 195 | + | │ Quarantined │ | |
| 196 | + | └─────────────┘ | |
| 197 | + | ``` | |
| 198 | + | ||
| 199 | + | Transitions: | |
| 200 | + | ||
| 201 | + | | From | To | Trigger | | |
| 202 | + | |------|----|---------| | |
| 203 | + | | (no row) | Pending | Upload confirmed (S3 object created, row inserted) | | |
| 204 | + | | Pending | Scanning | Worker dequeues | | |
| 205 | + | | Scanning | Clean | All deterministic layers pass, no `Fail` in any layer | | |
| 206 | + | | Scanning | Quarantined | Any layer returns `Fail` | | |
| 207 | + | | Scanning | HeldForReview | Any fail-closed layer returns `Error`, or admin policy triggers (size cap, file type, creator new-account, etc.) | | |
| 208 | + | | HeldForReview | Clean | Admin promotes; audit-logged | | |
| 209 | + | | HeldForReview | Quarantined | Admin quarantines; audit-logged with note | | |
| 210 | + | | Clean | Scanning | Admin-triggered re-scan | | |
| 211 | + | | Quarantined | (no transition without DB-level intervention) | Quarantine is sticky by design | | |
| 212 | + | ||
| 213 | + | ### 4.3 Re-scan cadence | |
| 214 | + | ||
| 215 | + | - **Trigger re-scan automatically** when: | |
| 216 | + | - YARA ruleset is updated (operator-controlled). | |
| 217 | + | - ClamAV `freshclam` rolls a new sig DB version. | |
| 218 | + | - An admin explicitly clicks Re-scan. | |
| 219 | + | - **Background sweep**: every 30 days, re-run hash-lookup layers (abuse.ch, | |
| 220 | + | optionally MetaDefender) across the full `Clean` corpus. Detects sigs that | |
| 221 | + | have *become* known-bad over time. Quarantine on `Fail`, log to audit. | |
| 222 | + | ||
| 223 | + | ### 4.4 Async architecture | |
| 224 | + | ||
| 225 | + | Move scan off the upload critical path: | |
| 226 | + | ||
| 227 | + | ``` | |
| 228 | + | POST /api/versions/{id}/upload/confirm | |
| 229 | + | → create version row (scan_status=Pending) | |
| 230 | + | → enqueue scan_job | |
| 231 | + | → return 200 with status="pending" | |
| 232 | + | → client polls or subscribes via SSE | |
| 233 | + | ||
| 234 | + | scan_worker (tokio task in same process) | |
| 235 | + | → SELECT FOR UPDATE SKIP LOCKED jobs WHERE status='pending' | |
| 236 | + | → run pipeline | |
| 237 | + | → INSERT INTO file_scan_results (per-layer JSON) | |
| 238 | + | → UPDATE versions.scan_status | |
| 239 | + | → publish status-changed event (SSE channel for admin + creator) | |
| 240 | + | ``` | |
| 241 | + | ||
| 242 | + | Job table: | |
| 243 | + | ||
| 244 | + | ```sql | |
| 245 | + | CREATE TABLE scan_jobs ( | |
| 246 | + | id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| 247 | + | target_kind TEXT NOT NULL CHECK (target_kind IN ('version', 'item_cover', 'item_attachment')), | |
| 248 | + | target_id UUID NOT NULL, | |
| 249 | + | s3_key TEXT NOT NULL, | |
| 250 | + | status TEXT NOT NULL CHECK (status IN ('queued', 'running', 'done', 'failed')), | |
| 251 | + | attempts INT NOT NULL DEFAULT 0, | |
| 252 | + | enqueued_at TIMESTAMPTZ NOT NULL DEFAULT now(), | |
| 253 | + | started_at TIMESTAMPTZ, | |
| 254 | + | completed_at TIMESTAMPTZ, | |
| 255 | + | last_error TEXT | |
| 256 | + | ); | |
| 257 | + | CREATE INDEX scan_jobs_status_enqueued ON scan_jobs (status, enqueued_at); | |
| 258 | + | ``` | |
| 259 | + | ||
| 260 | + | Worker pool: N workers (configurable, default 2), each pulling with | |
| 261 | + | `SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1`. Job retry policy: 3 attempts on | |
| 262 | + | transient failure, then `status='failed'`, surface in admin dashboard. | |
| 263 | + | ||
| 264 | + | ### 4.5 Creator-visible UX | |
| 265 | + | ||
| 266 | + | In the creator dashboard, each version row gains a scan-status badge: | |
| 267 | + | ||
| 268 | + | - **Pending** — neutral, "Scanning…" | |
| 269 | + | - **Clean** — positive, "Cleared" | |
| 270 | + | - **HeldForReview** — warning, "Awaiting review (typically under 24h)" | |
| 271 | + | - **Quarantined** — negative, "Quarantined — [contact support] to appeal" | |
| 272 | + | ||
| 273 | + | Held / quarantined rows expand to show per-layer detail: which layer flagged, | |
| 274 | + | what it said, what the creator can do. Honest, transparent, brand-aligned. | |
| 275 | + | ||
| 276 | + | Public-facing download buttons: | |
| 277 | + | ||
| 278 | + | - Hidden entirely while `Pending` or `Scanning`. | |
| 279 | + | - Visible on `Clean`. | |
| 280 | + | - Hidden on `HeldForReview` / `Quarantined` (creator sees a note in their | |
| 281 | + | dashboard; public sees nothing — graceful degradation). | |
| 282 | + | ||
| 283 | + | ## 5. /admin/uploads dashboard | |
| 284 | + | ||
| 285 | + | The audit surface. Existing route at `routes/admin/uploads.rs` is the seed; we | |
| 286 | + | extend it. | |
| 287 | + | ||
| 288 | + | ### 5.1 Page layout | |
| 289 | + | ||
| 290 | + | ``` | |
| 291 | + | ┌─────────────────────────────────────────────────────────────────────┐ | |
| 292 | + | │ Pipeline Health (last 24h / 7d toggle) │ | |
| 293 | + | │ ┌─────────────────┬──────────────┬──────────────┬─────────────────┐ │ | |
| 294 | + | │ │ content_type │ 100% 1ms │ 100% 1ms │ ✓ last: now │ │ | |
| 295 | + | │ │ structural │ 100% 2ms │ 100% 2ms │ ✓ last: now │ │ | |
| 296 | + | │ │ yara │ 99% 45ms │ 99% 43ms │ ✓ last: 12s ago │ │ | |
| 297 | + | │ │ clamav │ 0% — │ 0% — │ ✗ not running │ │ | |
| 298 | + | │ │ malwarebazaar │ 0% — │ 0% — │ ✗ 14 days ago │ │ | |
| 299 | + | │ │ signing_macos │ 98% 320ms │ 97% 350ms │ ✓ last: 1m ago │ │ | |
| 300 | + | │ └─────────────────┴──────────────┴──────────────┴─────────────────┘ │ | |
| 301 | + | └─────────────────────────────────────────────────────────────────────┘ | |
| 302 | + | ||
| 303 | + | ┌─────────────────────────────────────────────────────────────────────┐ | |
| 304 | + | │ Active Queue (Pending + Scanning) [auto-refresh on] │ | |
| 305 | + | │ │ | |
| 306 | + | │ 3 files scanning, 0 stuck │ | |
| 307 | + | │ • GoingsOn_0.4.0_aarch64.dmg — scanning (4s) │ | |
| 308 | + | │ • SamplePack.zip — pending (queued 1s ago) │ | |
| 309 | + | └─────────────────────────────────────────────────────────────────────┘ | |
| 310 | + | ||
| 311 | + | ┌─────────────────────────────────────────────────────────────────────┐ | |
| 312 | + | │ Held for Review 9 items │ | |
| 313 | + | │ │ | |
| 314 | + | │ [+] GoingsOn_0.3.1_x64-setup.exe │ | |
| 315 | + | │ creator: max • item: GoingsOn Desktop • 14 days held │ | |
| 316 | + | │ layers: ✓ct ✓struct -arch -yara -clam ⚠mb │ | |
| 317 | + | │ [Promote] [Quarantine] [Re-scan] │ | |
| 318 | + | │ │ | |
| 319 | + | │ [+] GoingsOn_0.3.1_amd64.AppImage ... │ | |
| 320 | + | └─────────────────────────────────────────────────────────────────────┘ | |
| 321 | + | ||
| 322 | + | ┌─────────────────────────────────────────────────────────────────────┐ | |
| 323 | + | │ Recent History (last 30d) [▶ expand] [filter ▾] │ | |
| 324 | + | └─────────────────────────────────────────────────────────────────────┘ | |
| 325 | + | ``` | |
| 326 | + | ||
| 327 | + | ### 5.2 Section spec | |
| 328 | + | ||
| 329 | + | **Pipeline Health (top panel)** | |
| 330 | + | - Per-layer rolling stats over the last 24h and 7d. | |
| 331 | + | - Columns: layer name, success rate, error rate, p50 latency, p95 latency, | |
| 332 | + | health badge, last successful response timestamp. | |
| 333 | + | - Health badge logic: `✓` if last successful response < 1h ago AND error rate < | |
| 334 | + | 10%; `⚠` if either degraded; `✗` if no successful response in 24h or error | |
| 335 | + | rate > 50%. | |
| 336 | + | - Click any row → drill-down to last 100 layer invocations with full | |
| 337 | + | per-call detail. Useful for diagnosing intermittent regressions. | |
| 338 | + | ||
| 339 | + | **Active Queue (Pending + Scanning)** | |
| 340 | + | - Auto-refreshes via HTMX SSE. | |
| 341 | + | - Shows count of files Pending vs Scanning. | |
| 342 | + | - "Stuck" detection: anything in `Scanning` for > 5 minutes is flagged red. | |
| 343 | + | - One-line entry per file: filename, current state, elapsed time. No | |
| 344 | + | per-layer detail at this stage (scan not done yet). | |
| 345 | + | ||
| 346 | + | **Held for Review** | |
| 347 | + | - Default expanded — these need decisions. | |
| 348 | + | - One row per held version. Per row: | |
| 349 | + | - Creator handle + item title + version filename + size + age-of-hold. | |
| 350 | + | - Layer chips: small colored squares, one per layer, showing verdict | |
| 351 | + | (`pass`, `skip`, `fail`, `error`, `pending`). | |
| 352 | + | - Click any chip → in-place expand to the layer's `detail` JSON. | |
| 353 | + | - Three actions: **Promote** (with optional note), **Quarantine** (note | |
| 354 | + | required), **Re-scan** (re-runs pipeline now). | |
| 355 | + | - Bulk operations: | |
| 356 | + | - Select N rows → **Bulk Re-scan** (no note required; common after a layer | |
| 357 | + | fix lands). | |
| 358 | + | - Select N rows → **Bulk Promote** (single shared note required, hard | |
| 359 | + | audit-logged). | |
| 360 | + | - No bulk Quarantine — every quarantine is an individual decision. | |
| 361 | + | - Sort: default by held-at ascending (oldest first); switchable. | |
| 362 | + | - Filter: by app, by creator, by which layer flagged, by file type. | |
| 363 | + | ||
| 364 | + | **Recent History (collapsible)** | |
| 365 | + | - Default collapsed. When expanded: dense grid of last 30d, all statuses. | |
| 366 | + | - Columns: creator, item, version, status, scanned-at, layer-summary chip | |
| 367 | + | strip, action. | |
| 368 | + | - Filter: by status, app, creator, date range. | |
| 369 | + | - Pagination at 100 rows; further history at `/admin/uploads/archive`. | |
| 370 | + | - Each row click → same expandable per-layer detail panel. | |
| 371 | + | ||
| 372 | + | **Audit Trail** | |
| 373 | + | - New table: | |
| 374 | + | ```sql | |
| 375 | + | CREATE TABLE scan_admin_actions ( | |
| 376 | + | id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| 377 | + | version_id UUID, | |
| 378 | + | item_id UUID, | |
| 379 | + | admin_id UUID NOT NULL, | |
| 380 | + | action TEXT NOT NULL CHECK (action IN ('promote', 'quarantine', 'rescan', 'bulk_promote', 'bulk_rescan')), | |
| 381 | + | prev_status TEXT, | |
| 382 | + | new_status TEXT, | |
| 383 | + | note TEXT, | |
| 384 | + | created_at TIMESTAMPTZ NOT NULL DEFAULT now() | |
| 385 | + | ); | |
| 386 | + | ``` | |
| 387 | + | - Inline tooltip on each row: "Last action: promoted by max, 2 days ago". | |
| 388 | + | - Full log at `/admin/uploads/audit` with filters. | |
| 389 | + | ||
| 390 | + | ### 5.3 Access control | |
| 391 | + | ||
| 392 | + | - Gated on existing `AdminUser` extractor (PLATFORM_ADMIN_ID single-user | |
| 393 | + | model). No per-forum-style role splitting. | |
| 394 | + | - All POST routes CSRF-protected (existing middleware). | |
| 395 | + | - All admin actions audit-logged (table above). | |
| 396 | + | ||
| 397 | + | ### 5.4 Live updates | |
| 398 | + | ||
| 399 | + | - HTMX SSE channel `/admin/uploads/events` pushing: | |
| 400 | + | - `scan-started`, `scan-completed`, `scan-stuck` events. | |
| 401 | + | - Active Queue + Pipeline Health update without page reload. | |
| 402 | + | - History grid stays static; reloads only on filter change. | |
| 403 | + | ||
| 404 | + | ## 6. Monitoring and alerting | |
| 405 | + | ||
| 406 | + | Add PoM checks for the scan pipeline: | |
| 407 | + | ||
| 408 | + | | Check | Threshold | Action on fire | | |
| 409 | + | |-------|-----------|----------------| | |
| 410 | + | | Per-layer error rate (1h window) | > 10% | Notify admin (email + dashboard banner) | | |
| 411 | + | | Per-layer success count (24h) | == 0 | Page admin: layer fully down | | |
| 412 | + | | Queue depth | > 50 pending | Notify: workers falling behind | | |
| 413 | + | | Stuck-scan count (Scanning > 5min) | > 5 | Notify: stuck workers | | |
| 414 | + | | Held-for-review count | > 100 | Notify: review backlog growing | | |
| 415 | + | ||
| 416 | + | PoM module: `pom/src/checks/scan_pipeline.rs`, queries the same `/admin/uploads` | |
| 417 | + | data endpoints (or hits the DB directly via the existing PoM SSH pattern). | |
| 418 | + | ||
| 419 | + | ## 7. Public transparency | |
| 420 | + | ||
| 421 | + | Public-facing page at `/about/scanning` (DocEngine markdown). Contents: | |
| 422 | + | ||
| 423 | + | - What we scan with: each layer named, linked to its docs. | |
| 424 | + | - What status each can produce. | |
| 425 | + | - What "Clean" actually means and what it doesn't. | |
| 426 | + | - Aggregate stats (auto-substituted via assumptions/derived values): | |
| 427 | + | - "X% of uploads cleared automatically in <2min, last 30 days." | |
| 428 | + | - "Y files manually reviewed, last 30 days." | |
| 429 | + | - "Z files quarantined, last 30 days." | |
| 430 | + | - Creator appeals process for false positives. | |
| 431 | + | ||
| 432 | + | Per-version public scan-result panel on the public item page: | |
| 433 | + | ||
| 434 | + | - "This version was scanned on [date]. Cleared by N layers." | |
| 435 | + | - No per-layer detail surfaced publicly (avoids handing attackers a layer-by- | |
| 436 | + | layer evasion roadmap), but the existence of multi-layer scanning is visible. | |
| 437 | + | ||
| 438 | + | Brand alignment: this is the kind of unforced transparency that almost no | |
| 439 | + | distribution platform does. Differentiation surface, not just an | |
| 440 | + | implementation detail. | |
| 441 | + | ||
| 442 | + | ## 8. Sequencing | |
| 443 | + | ||
| 444 | + | Implementation order, sized for sequential landing: | |
| 445 | + | ||
| 446 | + | ### Phase 1 — Architectural floor | |
| 447 | + | ||
| 448 | + | 1. `scan_jobs` table + worker pool + status machine. | |
| 449 | + | 2. Move scan off the upload request handler. | |
| 450 | + | 3. Add `Pending`, `Scanning` to `FileScanStatus` enum. | |
| 451 | + | 4. Per-layer fail policy declared at registration; remove global has-error | |
| 452 | + | switch. | |
| 453 | + | 5. Tests: pipeline runs async, status flips correctly, fail-open vs | |
| 454 | + | fail-closed honored per layer. | |
| 455 | + | ||
| 456 | + | ### Phase 2 — Admin surface | |
| 457 | + | ||
| 458 | + | 6. Extend `/admin/uploads` to the three-section layout + health panel. | |
| 459 | + | 7. `scan_admin_actions` table + audit logging on every admin action. | |
| 460 | + | 8. HTMX SSE for live queue + health updates. | |
| 461 | + | 9. Per-layer detail expansion, bulk re-scan, bulk promote. | |
| 462 | + | ||
| 463 | + | ### Phase 3 — Layer set | |
| 464 | + | ||
| 465 | + | 10. Fix MalwareBazaar — register `Auth-Key`, add header, parse new response | |
| 466 | + | shape, tests against captured fixtures. | |
| 467 | + | 11. Install ClamAV daemon on prod + `freshclam` cron in deploy script. Wire | |
| 468 | + | `CLAMAV_SOCKET` env. | |
| 469 | + | 12. Pull Florian Roth `signature-base` YARA rules into prod; wire | |
| 470 | + | `YARA_RULES_DIR`. Add ruleset-version field to scan results so we can | |
| 471 | + | correlate. | |
| 472 | + | 13. URLhaus layer (string extraction + URL reputation). | |
| 473 | + | 14. Signing-trust layers (macOS, Windows, AppImage). These need helper | |
| 474 | + | binaries on prod (`codesign`, `spctl`, `signtool`, `gpg`) — vendor or | |
| 475 | + | install. Plan vendoring carefully: `codesign` is macOS-only, so the | |
| 476 | + | Hetzner-Linux server can't run it. **Open question**: cross-platform | |
| 477 | + | Mach-O signature verification. Candidates: `apple-codesign` Rust crate | |
| 478 | + | (Gregory Szorc's `rcodesign`), which can verify Apple signatures on | |
| 479 | + | Linux. Verify before committing. | |
| 480 | + | 15. MetaDefender Cloud free tier — second-opinion layer, triggered only when | |
| 481 | + | YARA or ClamAV flag a suspicion. | |
| 482 | + | ||
| 483 | + | ### Phase 4 — Operations | |
| 484 | + | ||
| 485 | + | 16. PoM scan-pipeline checks + alerting. | |
| 486 | + | 17. Re-scan sweeps (admin-triggered + monthly background). | |
| 487 | + | 18. Public `/about/scanning` page + per-version public scan panel. | |
| 488 | + | 19. Held-file backlog: re-scan the existing 9 held versions under the new | |
| 489 | + | pipeline. If they come out Clean (expected; they're our own builds), | |
| 490 | + | they flip automatically. If anything flags, we investigate manually. | |
| 491 | + | ||
| 492 | + | ### Phase 5 — Reserved for future | |
| 493 | + | ||
| 494 | + | - Hybrid Analysis sandbox detonation on admin-flagged samples. | |
| 495 | + | - Hash-reputation pass: same SHA shipped Clean by trusted creator = | |
| 496 | + | fast-pass on a re-upload by the same creator. Cross-creator fast-pass is | |
| 497 | + | not safe (a malicious creator could pre-clear a hash) and is out of scope. | |
| 498 | + | - Creator-attested provenance: SLSA-style supply-chain attestations, | |
| 499 | + | reproducible-build verification. Long-term, separate document. | |
| 500 | + |
Lines truncated