Skip to main content

max / makenotwork

31.2 KB · 634 lines History Blame Raw
1 # Scan Pipeline Audit and Redesign
2
3 Status: Draft, 2026-05-24. Author: Max + Claude. Not yet implemented.
4
5 This document is the design predicate for a full redesign of MNW's file-scanning
6 pipeline. It is the result of discovering that every upload on the platform since
7 2026-05-10 has been silently held at `held_for_review` because the
8 MalwareBazaar layer fails closed on a missing API key — and there was no
9 admin-visible signal that anything was wrong.
10
11 The shipped pipeline is structurally fragile in ways that are not unique to that
12 one bug. This document re-derives what the pipeline should look like, picks the
13 layer set, fixes the fail-closed-by-default policy, moves scanning off the
14 upload critical path, and specifies an admin surface that would have caught the
15 MalwareBazaar regression on day one.
16
17 No code changes accompany this document. Implementation is sequenced separately
18 once the design is reviewed.
19
20 ---
21
22 ## 1. Goals and non-goals
23
24 **Goals**
25
26 - A scan pipeline a one-person platform can defend in public and point to as a
27 trust differentiator.
28 - Multi-signal detection that survives the loss of any single layer.
29 - No silent platform-wide outage when a third-party scan endpoint changes
30 behavior.
31 - Honest, transparent communication to creators and downloaders about what is
32 scanned and what is found.
33 - Zero per-call vendor lock-in to a hyperscaler-owned threat-intel platform.
34
35 **Non-goals**
36
37 - Best-in-class detection of nation-state malware. That bar belongs to enterprise
38 SOCs with seven-figure budgets. We aim for "covers ~90% of real-world malware
39 on a creator-uploaded distribution platform" — explicitly bounded.
40 - Real-time dynamic-analysis sandboxing for every upload. Cost-prohibitive at
41 this stage; reserved for manual review of flagged samples.
42 - Detection of supply-chain compromise inside dependencies of an uploaded
43 binary. Out of scope for a file scanner; addressed separately by
44 reproducible-build verification and creator-attested provenance, future work.
45
46 ## 2. Threat model
47
48 Adversaries, in priority order of damage to the platform:
49
50 1. **Compromised creator account pushing a poisoned update.**
51 A long-standing creator's account is taken over. Attacker uploads a new
52 version of an existing app with malware bundled in. Existing customers
53 auto-update or download a "trusted" creator's release. Highest reputational
54 damage because creator trust is the platform's load-bearing asset.
55 2. **Malicious-actor signup uploading malware as a free or cheap "tool".**
56 Attacker creates a fresh account, lists a fake app, hopes a few downloads
57 happen before takedown. Lower individual blast radius but easier to attempt
58 at volume.
59 3. **Legitimate creator uploads benign software that triggers a false positive.**
60 Crypto wallets, system utilities, low-level audio tools, and AppImages all
61 routinely false-positive on signature AVs. Creator-facing UX must handle
62 this gracefully — visible status, clear appeal path, no silent quarantine.
63 4. **Account-takeover update path** (subset of (1)): even if the new binary is
64 signed with the creator's existing Apple Dev ID or Authenticode cert, an
65 attacker with cert access can re-sign. Defense relies on out-of-band
66 signals: new-device login, IP reputation, version-velocity anomalies,
67 creator email confirmation on new releases.
68 5. **Novel zero-day** that no engine in our stack recognizes.
69 Unavoidable in the general case. Mitigation: hash-reputation comparison
70 over time (re-scan), public scan-result transparency so other downloaders
71 can flag, and an admin review queue that surfaces anything the auto-stack
72 can't classify.
73 6. **Embedded malicious URLs in otherwise benign documentation, license files,
74 or app text.** Lower priority but cheap to defend via URL reputation
75 lookups on extracted strings.
76
77 The dominant cases are (1) and (2). (3) is the dominant *user-experience*
78 problem we'll cause on ourselves if we get fail-closed policy wrong.
79
80 ## 3. Current pipeline
81
82 ### 3.1 Layers as implemented
83
84 `MNW/server/src/scanning/` runs six layers per upload (`scanning/mod.rs`):
85
86 | # | Layer | Implementation | Verdict on absent config |
87 |---|-------|----------------|--------------------------|
88 | 1 | `content_type` | Magic-byte sniffing of file header | Pass / Fail |
89 | 2 | `structural` | Format-specific parser (PE, ELF, Mach-O, etc.) | Pass / Skip |
90 | 3 | `archive` | ZIP / tar walk for nested malware | Pass / Skip |
91 | 4 | `yara` | `yara-x` rule engine | Skip if no rules loaded |
92 | 5 | `clamav` | `clamd` socket over `INSTREAM` | Skip if no socket configured |
93 | 6 | `malwarebazaar` | abuse.ch hash lookup HTTP API | **Error if API shape unexpected** |
94
95 Final disposition (`scanning/mod.rs::ScanPipeline::scan`):
96
97 - Any layer `Fail``Quarantined`
98 - Any layer `Error``HeldForReview` (fail closed)
99 - Otherwise → `Clean`
100
101 ### 3.2 Gaps observed
102
103 | Gap | Impact |
104 |-----|--------|
105 | ClamAV daemon not installed on prod | Layer always `Skip`; baseline AV signal absent |
106 | YARA rules directory empty on prod | Layer always `Skip`; no custom signatures |
107 | MalwareBazaar response shape changed (now requires `Auth-Key` header) | Layer returns `Error` on every call; pipeline returns `HeldForReview` on every upload |
108 | No code-signing verification (Apple notarization, Authenticode, AppImage GPG) | Largest available *positive* trust signal entirely unused |
109 | Synchronous scan on upload request handler | Slow third-party API stalls the upload thread; one stuck third party blocks every concurrent upload |
110 | Fail-closed-by-default for `Error` verdicts on optional layers | Optional best-effort layers can take down the whole pipeline (this is exactly what happened) |
111 | No admin surface for layer-health monitoring | Two-week silent regression with no alert; only noticed when downloads broke |
112 | No admin queue UI beyond a single "held items" list | No bulk re-scan, no per-layer detail, no history, no audit log |
113 | No rescan capability | Held files can't be re-evaluated after a layer is fixed; only path is `UPDATE versions SET scan_status='clean'` |
114 | Scan results stored per-`s3_key` not per-`version_id` | Detail joins go through `s3_key`, breaks if a file is referenced by multiple versions or moved |
115
116 ### 3.3 Gap that triggered this audit
117
118 Every upload since 2026-05-10 sits at `held_for_review`. Each `file_scan_results.scan_layers` row shows the same final entry:
119
120 ```
121 {"layer":"malwarebazaar","verdict":"error","detail":"Unexpected query_status: unknown"}
122 ```
123
124 The MalwareBazaar `get_info` endpoint changed: unauthenticated requests no
125 longer return a `query_status` field. Our parser defaults the missing field to
126 the literal string `"unknown"`, which falls through the match arm and returns
127 `Error`. The fail-closed policy then converts that into `HeldForReview` for
128 every upload.
129
130 No alert fired. The only signal was a user trying to download a GO build and
131 getting a generic "Failed to get download URL" toast.
132
133 ## 4. Target architecture
134
135 Three structural shifts:
136
137 1. **Async scan, sync upload.** Upload returns immediately with `Pending`
138 status. Scan runs in a background worker. Status flips when scan completes.
139 Downloads gate on terminal status (`Clean`, `Quarantined`).
140 2. **Explicit per-layer fail policy.** Each layer declares at registration
141 whether `Error` is fail-open (Skip-equivalent) or fail-closed
142 (`HeldForReview`). No global has-error switch.
143 3. **Multi-signal detection with positive trust signals.** Code-signing and
144 notarization checks contribute *evidence of trust*, not just absence of
145 threats. A properly notarized macOS binary from a verified Dev ID team is
146 strong positive evidence; an unsigned `.exe` is weaker baseline. Today
147 neither signal exists.
148
149 ### 4.1 Layer set (post-audit)
150
151 | # | Layer | Source | Fail-open or fail-closed on layer error |
152 |---|-------|--------|------------------------------------------|
153 | 1 | `content_type` | Magic-byte sniffing, in-process | fail-closed (cheap, deterministic) |
154 | 2 | `structural` | Format parsers, in-process | fail-closed (cheap, deterministic) |
155 | 3 | `archive` | Nested-archive walk, in-process | fail-closed (cheap, deterministic) |
156 | 4 | `yara` | yara-x + Florian Roth `signature-base` ruleset | fail-closed (in-process, deterministic) |
157 | 5 | `clamav` | `clamd` daemon + `freshclam` cron | fail-open (network-dependent local service) |
158 | 6 | `signing_macos` | `codesign --verify --deep --strict` + `spctl --assess --type exec` + notarization staple check | fail-open on macOS-only files; positive evidence if pass |
159 | 7 | `signing_windows` | `signtool verify /pa` + cert chain inspection | fail-open on Windows-only files; positive evidence if pass |
160 | 8 | `signing_linux` | AppImage GPG signature + zsync URL presence; deb/rpm signatures | fail-open; positive evidence if pass |
161 | 9 | `abuse_malwarebazaar` | abuse.ch hash lookup with `Auth-Key` header | fail-open (third-party network) |
162 | 10 | `abuse_urlhaus` | URL reputation on strings extracted from binary | fail-open (third-party network) |
163 | 11 | `metadefender_cloud` | OPSWAT free tier (40/day), **second-opinion only** on YARA/ClamAV flags | fail-open (rate-limited, optional) |
164 | 12 | `hybrid_analysis` | CrowdStrike Falcon Sandbox free key (30/month), **admin-triggered only** | manual, not on the auto-path |
165
166 **Policy:** layers 1-4 are deterministic in-process work and fail closed
167 because an internal bug producing `Error` is a code defect, not an outage.
168 Layers 5+ are network or local-service dependencies and fail open because
169 external regressions must never take down the platform — they degrade signal
170 quality, not availability.
171
172 ### 4.2 Status machine
173
174 ```
175 ┌──────────────────┐
176 upload accepted ───▶ Pending │
177 └────────┬─────────┘
178 │ scan worker picks up
179 ┌────────▼─────────┐
180 │ Scanning │
181 └────────┬─────────┘
182 ┌──────────────┼──────────────┐
183 ▼ ▼ ▼
184 ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
185 │ Clean │ │HeldForReview│ │ Quarantined │
186 └─────────────┘ └──────┬──────┘ └─────────────┘
187 │ admin Promote
188
189 ┌─────────────┐
190 │ Clean │
191 └─────────────┘
192 │ admin Quarantine
193
194 ┌─────────────┐
195 │ Quarantined │
196 └─────────────┘
197 ```
198
199 Transitions:
200
201 | From | To | Trigger |
202 |------|----|---------|
203 | (no row) | Pending | Upload confirmed (S3 object created, row inserted) |
204 | Pending | Scanning | Worker dequeues |
205 | Scanning | Clean | All deterministic layers pass, no `Fail` in any layer |
206 | Scanning | Quarantined | Any layer returns `Fail` |
207 | Scanning | HeldForReview | Any fail-closed layer returns `Error`, or admin policy triggers (size cap, file type, creator new-account, etc.) |
208 | HeldForReview | Clean | Admin promotes; audit-logged |
209 | HeldForReview | Quarantined | Admin quarantines; audit-logged with note |
210 | Clean | Scanning | Admin-triggered re-scan |
211 | Quarantined | (no transition without DB-level intervention) | Quarantine is sticky by design |
212
213 ### 4.3 Re-scan cadence
214
215 - **Trigger re-scan automatically** when:
216 - YARA ruleset is updated (operator-controlled).
217 - ClamAV `freshclam` rolls a new sig DB version.
218 - An admin explicitly clicks Re-scan.
219 - **Background sweep**: every 30 days, re-run hash-lookup layers (abuse.ch,
220 optionally MetaDefender) across the full `Clean` corpus. Detects sigs that
221 have *become* known-bad over time. Quarantine on `Fail`, log to audit.
222
223 ### 4.4 Async architecture
224
225 Move scan off the upload critical path:
226
227 ```
228 POST /api/versions/{id}/upload/confirm
229 → create version row (scan_status=Pending)
230 → enqueue scan_job
231 → return 200 with status="pending"
232 → client polls or subscribes via SSE
233
234 scan_worker (tokio task in same process)
235 → SELECT FOR UPDATE SKIP LOCKED jobs WHERE status='pending'
236 → run pipeline
237 → INSERT INTO file_scan_results (per-layer JSON)
238 → UPDATE versions.scan_status
239 → publish status-changed event (SSE channel for admin + creator)
240 ```
241
242 Job table:
243
244 ```sql
245 CREATE TABLE scan_jobs (
246 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
247 target_kind TEXT NOT NULL CHECK (target_kind IN ('version', 'item_cover', 'item_attachment')),
248 target_id UUID NOT NULL,
249 s3_key TEXT NOT NULL,
250 status TEXT NOT NULL CHECK (status IN ('queued', 'running', 'done', 'failed')),
251 attempts INT NOT NULL DEFAULT 0,
252 enqueued_at TIMESTAMPTZ NOT NULL DEFAULT now(),
253 started_at TIMESTAMPTZ,
254 completed_at TIMESTAMPTZ,
255 last_error TEXT
256 );
257 CREATE INDEX scan_jobs_status_enqueued ON scan_jobs (status, enqueued_at);
258 ```
259
260 Worker pool: N workers (configurable, default 2), each pulling with
261 `SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1`. Job retry policy: 3 attempts on
262 transient failure, then `status='failed'`, surface in admin dashboard.
263
264 ### 4.5 Creator-visible UX
265
266 In the creator dashboard, each version row gains a scan-status badge:
267
268 - **Pending** — neutral, "Scanning…"
269 - **Clean** — positive, "Cleared"
270 - **HeldForReview** — warning, "Awaiting review (typically under 24h)"
271 - **Quarantined** — negative, "Quarantined — [contact support] to appeal"
272
273 Held / quarantined rows expand to show per-layer detail: which layer flagged,
274 what it said, what the creator can do. Honest, transparent, brand-aligned.
275
276 Public-facing download buttons:
277
278 - Hidden entirely while `Pending` or `Scanning`.
279 - Visible on `Clean`.
280 - Hidden on `HeldForReview` / `Quarantined` (creator sees a note in their
281 dashboard; public sees nothing — graceful degradation).
282
283 ## 5. /admin/uploads dashboard
284
285 The audit surface. Existing route at `routes/admin/uploads.rs` is the seed; we
286 extend it.
287
288 ### 5.1 Page layout
289
290 ```
291 ┌─────────────────────────────────────────────────────────────────────┐
292 │ Pipeline Health (last 24h / 7d toggle) │
293 │ ┌─────────────────┬──────────────┬──────────────┬─────────────────┐ │
294 │ │ content_type │ 100% 1ms │ 100% 1ms │ ✓ last: now │ │
295 │ │ structural │ 100% 2ms │ 100% 2ms │ ✓ last: now │ │
296 │ │ yara │ 99% 45ms │ 99% 43ms │ ✓ last: 12s ago │ │
297 │ │ clamav │ 0% — │ 0% — │ ✗ not running │ │
298 │ │ malwarebazaar │ 0% — │ 0% — │ ✗ 14 days ago │ │
299 │ │ signing_macos │ 98% 320ms │ 97% 350ms │ ✓ last: 1m ago │ │
300 │ └─────────────────┴──────────────┴──────────────┴─────────────────┘ │
301 └─────────────────────────────────────────────────────────────────────┘
302
303 ┌─────────────────────────────────────────────────────────────────────┐
304 │ Active Queue (Pending + Scanning) [auto-refresh on] │
305 │ │
306 │ 3 files scanning, 0 stuck │
307 │ • GoingsOn_0.4.0_aarch64.dmg — scanning (4s) │
308 │ • SamplePack.zip — pending (queued 1s ago) │
309 └─────────────────────────────────────────────────────────────────────┘
310
311 ┌─────────────────────────────────────────────────────────────────────┐
312 │ Held for Review 9 items │
313 │ │
314 │ [+] GoingsOn_0.3.1_x64-setup.exe │
315 │ creator: max • item: GoingsOn Desktop • 14 days held │
316 │ layers: ✓ct ✓struct -arch -yara -clam ⚠mb │
317 │ [Promote] [Quarantine] [Re-scan] │
318 │ │
319 │ [+] GoingsOn_0.3.1_amd64.AppImage ... │
320 └─────────────────────────────────────────────────────────────────────┘
321
322 ┌─────────────────────────────────────────────────────────────────────┐
323 │ Recent History (last 30d) [▶ expand] [filter ▾] │
324 └─────────────────────────────────────────────────────────────────────┘
325 ```
326
327 ### 5.2 Section spec
328
329 **Pipeline Health (top panel)**
330 - Per-layer rolling stats over the last 24h and 7d.
331 - Columns: layer name, success rate, error rate, p50 latency, p95 latency,
332 health badge, last successful response timestamp.
333 - Health badge logic: `` if last successful response < 1h ago AND error rate <
334 10%; `` if either degraded; `` if no successful response in 24h or error
335 rate > 50%.
336 - Click any row → drill-down to last 100 layer invocations with full
337 per-call detail. Useful for diagnosing intermittent regressions.
338
339 **Active Queue (Pending + Scanning)**
340 - Auto-refreshes via HTMX SSE.
341 - Shows count of files Pending vs Scanning.
342 - "Stuck" detection: anything in `Scanning` for > 5 minutes is flagged red.
343 - One-line entry per file: filename, current state, elapsed time. No
344 per-layer detail at this stage (scan not done yet).
345
346 **Held for Review**
347 - Default expanded — these need decisions.
348 - One row per held version. Per row:
349 - Creator handle + item title + version filename + size + age-of-hold.
350 - Layer chips: small colored squares, one per layer, showing verdict
351 (`pass`, `skip`, `fail`, `error`, `pending`).
352 - Click any chip → in-place expand to the layer's `detail` JSON.
353 - Three actions: **Promote** (with optional note), **Quarantine** (note
354 required), **Re-scan** (re-runs pipeline now).
355 - Bulk operations:
356 - Select N rows → **Bulk Re-scan** (no note required; common after a layer
357 fix lands).
358 - Select N rows → **Bulk Promote** (single shared note required, hard
359 audit-logged).
360 - No bulk Quarantine — every quarantine is an individual decision.
361 - Sort: default by held-at ascending (oldest first); switchable.
362 - Filter: by app, by creator, by which layer flagged, by file type.
363
364 **Recent History (collapsible)**
365 - Default collapsed. When expanded: dense grid of last 30d, all statuses.
366 - Columns: creator, item, version, status, scanned-at, layer-summary chip
367 strip, action.
368 - Filter: by status, app, creator, date range.
369 - Pagination at 100 rows; further history at `/admin/uploads/archive`.
370 - Each row click → same expandable per-layer detail panel.
371
372 **Audit Trail**
373 - New table:
374 ```sql
375 CREATE TABLE scan_admin_actions (
376 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
377 version_id UUID,
378 item_id UUID,
379 admin_id UUID NOT NULL,
380 action TEXT NOT NULL CHECK (action IN ('promote', 'quarantine', 'rescan', 'bulk_promote', 'bulk_rescan')),
381 prev_status TEXT,
382 new_status TEXT,
383 note TEXT,
384 created_at TIMESTAMPTZ NOT NULL DEFAULT now()
385 );
386 ```
387 - Inline tooltip on each row: "Last action: promoted by max, 2 days ago".
388 - Full log at `/admin/uploads/audit` with filters.
389
390 ### 5.3 Access control
391
392 - Gated on existing `AdminUser` extractor (PLATFORM_ADMIN_ID single-user
393 model). No per-forum-style role splitting.
394 - All POST routes CSRF-protected (existing middleware).
395 - All admin actions audit-logged (table above).
396
397 ### 5.4 Live updates
398
399 - HTMX SSE channel `/admin/uploads/events` pushing:
400 - `scan-started`, `scan-completed`, `scan-stuck` events.
401 - Active Queue + Pipeline Health update without page reload.
402 - History grid stays static; reloads only on filter change.
403
404 ## 6. Monitoring and alerting
405
406 Add PoM checks for the scan pipeline:
407
408 | Check | Threshold | Action on fire |
409 |-------|-----------|----------------|
410 | Per-layer error rate (1h window) | > 10% | Notify admin (email + dashboard banner) |
411 | Per-layer success count (24h) | == 0 | Page admin: layer fully down |
412 | Queue depth | > 50 pending | Notify: workers falling behind |
413 | Stuck-scan count (Scanning > 5min) | > 5 | Notify: stuck workers |
414 | Held-for-review count | > 100 | Notify: review backlog growing |
415
416 PoM module: `pom/src/checks/scan_pipeline.rs`, queries the same `/admin/uploads`
417 data endpoints (or hits the DB directly via the existing PoM SSH pattern).
418
419 ## 7. Public transparency
420
421 Public-facing page at `/about/scanning` (DocEngine markdown). Contents:
422
423 - What we scan with: each layer named, linked to its docs.
424 - What status each can produce.
425 - What "Clean" actually means and what it doesn't.
426 - Aggregate stats (auto-substituted via assumptions/derived values):
427 - "X% of uploads cleared automatically in <2min, last 30 days."
428 - "Y files manually reviewed, last 30 days."
429 - "Z files quarantined, last 30 days."
430 - Creator appeals process for false positives.
431
432 Per-version public scan-result panel on the public item page:
433
434 - "This version was scanned on [date]. Cleared by N layers."
435 - No per-layer detail surfaced publicly (avoids handing attackers a layer-by-
436 layer evasion roadmap), but the existence of multi-layer scanning is visible.
437
438 Brand alignment: this is the kind of unforced transparency that almost no
439 distribution platform does. Differentiation surface, not just an
440 implementation detail.
441
442 ## 8. Sequencing
443
444 Implementation order, sized for sequential landing:
445
446 ### Phase 1 — Architectural floor
447
448 1. `scan_jobs` table + worker pool + status machine.
449 2. Move scan off the upload request handler.
450 3. Add `Pending`, `Scanning` to `FileScanStatus` enum.
451 4. Per-layer fail policy declared at registration; remove global has-error
452 switch.
453 5. Tests: pipeline runs async, status flips correctly, fail-open vs
454 fail-closed honored per layer.
455
456 ### Phase 2 — Admin surface
457
458 6. Extend `/admin/uploads` to the three-section layout + health panel.
459 7. `scan_admin_actions` table + audit logging on every admin action.
460 8. HTMX SSE for live queue + health updates.
461 9. Per-layer detail expansion, bulk re-scan, bulk promote.
462
463 ### Phase 3 — Layer set
464
465 10. Fix MalwareBazaar — register `Auth-Key`, add header, parse new response
466 shape, tests against captured fixtures.
467 11. Install ClamAV daemon on prod + `freshclam` cron in deploy script. Wire
468 `CLAMAV_SOCKET` env.
469 12. Pull Florian Roth `signature-base` YARA rules into prod; wire
470 `YARA_RULES_DIR`. Add ruleset-version field to scan results so we can
471 correlate.
472 13. URLhaus layer (string extraction + URL reputation).
473 14. Signing-trust layers (macOS, Windows, AppImage). These need helper
474 binaries on prod (`codesign`, `spctl`, `signtool`, `gpg`) — vendor or
475 install. Plan vendoring carefully: `codesign` is macOS-only, so the
476 Hetzner-Linux server can't run it. **Open question**: cross-platform
477 Mach-O signature verification. Candidates: `apple-codesign` Rust crate
478 (Gregory Szorc's `rcodesign`), which can verify Apple signatures on
479 Linux. Verify before committing.
480 15. MetaDefender Cloud free tier — second-opinion layer, triggered only when
481 YARA or ClamAV flag a suspicion.
482
483 ### Phase 4 — Operations
484
485 16. PoM scan-pipeline checks + alerting.
486 17. Re-scan sweeps (admin-triggered + monthly background).
487 18. Public `/about/scanning` page + per-version public scan panel.
488 19. Held-file backlog: re-scan the existing 9 held versions under the new
489 pipeline. If they come out Clean (expected; they're our own builds),
490 they flip automatically. If anything flags, we investigate manually.
491
492 ### Phase 5 — Reserved for future
493
494 - Hybrid Analysis sandbox detonation on admin-flagged samples.
495 - Hash-reputation pass: same SHA shipped Clean by trusted creator =
496 fast-pass on a re-upload by the same creator. Cross-creator fast-pass is
497 not safe (a malicious creator could pre-clear a hash) and is out of scope.
498 - Creator-attested provenance: SLSA-style supply-chain attestations,
499 reproducible-build verification. Long-term, separate document.
500
501 ## 9. What we explicitly chose not to do
502
503 - **VirusTotal / Google Threat Intelligence**. Free tier ToS forbids
504 commercial workflow use. Paid tier ($20–50K/yr floor) is GCP-locked
505 vendor with hostile migration behavior (prepaid credits voided in
506 GTI migration). Misaligned with platform brand. Future revisit only if
507 upload volume passes ~1K/day **and** the current stack misses a real
508 incident.
509 - **Synchronous-scan retention**. Even with all layers fixed, holding the
510 upload thread on third-party calls is structurally fragile.
511 - **Per-layer-error fail-closed by default**. The single decision that
512 caused this audit. Reversed.
513 - **Global YARA-ruleset auto-update from upstream**. Roth's `signature-base`
514 is curated but YARA rules can have false positives; we pin a ruleset
515 version and bump deliberately, with re-scan of recent uploads on bump.
516
517 ## 10. Open questions
518
519 - **macOS signature verification from Linux**. `rcodesign` looks viable but
520 unverified. If not, a separate scan worker on a macOS host (or accepting
521 signing-status only when the creator's upload tool self-reports it,
522 cross-checked against the embedded signature blob) is the fallback.
523 - **Where does `scan_jobs` live?** Same Postgres or a dedicated queue
524 (Redis, RabbitMQ)? Default: Postgres + `SKIP LOCKED`, no new infra. Revisit
525 if queue depth + worker latency demand it. Probably never.
526 - **Bulk-promote audit threshold.** Should bulk-promote require dual-control
527 (a second admin's approval) above N rows? Today single-operator, so the
528 question is partly theoretical, but it shapes the schema.
529 - **Public scan-result panel detail level.** Per-layer breakdown helps
530 honest creators see what we evaluated; helps attackers fingerprint our
531 pipeline. Default: aggregate verdict only, with the layer list named but
532 not per-file verdicts. Decide on first iteration.
533
534 ## 11. Cost summary
535
536 | Item | Cost |
537 |------|------|
538 | abuse.ch (MalwareBazaar / URLhaus / ThreatFox) | $0 with free Auth-Key |
539 | ClamAV + `freshclam` | $0 |
540 | YARA + Roth `signature-base` ruleset | $0 |
541 | Apple notarization staple verify (`rcodesign`) | $0 |
542 | Authenticode signature verify | $0 |
543 | AppImage GPG signature verify | $0 |
544 | MetaDefender Cloud free tier (40/day) | $0 |
545 | Hybrid Analysis free key (30/month) | $0 |
546 | PoM monitoring | $0 (existing infra) |
547 | **Total recurring cost** | **$0** |
548
549 If upload volume + incident pressure justify it later: MetaDefender paid
550 (~$5–15K/yr estimated, commercial-use-licensed, no GCP lock-in) before any
551 consideration of VT/GTI.
552
553 ---
554
555 ## Appendix A: File and module touchpoints
556
557 | Concern | File(s) |
558 |---------|---------|
559 | Pipeline orchestration | `MNW/server/src/scanning/mod.rs` |
560 | Per-layer impls | `MNW/server/src/scanning/{yara,clamav,hash_lookup,...}.rs` |
561 | New: signing layers | `MNW/server/src/scanning/signing/{macos,windows,linux}.rs` |
562 | New: URLhaus layer | `MNW/server/src/scanning/urlhaus.rs` |
563 | Status enum | `MNW/server/src/db/enums.rs` (`FileScanStatus`) |
564 | Versions / scan-status columns | `MNW/server/src/db/scanning.rs`, `migrations/004_file_scan_status.sql` |
565 | New: `scan_jobs` worker | `MNW/server/src/scanning/worker.rs` |
566 | New: `scan_admin_actions` audit log | `MNW/server/src/db/scan_admin_actions.rs` |
567 | Admin dashboard route | `MNW/server/src/routes/admin/uploads.rs` |
568 | Admin dashboard templates | `MNW/server/templates/pages/admin/uploads*.html` |
569 | Download gate | `MNW/server/src/routes/storage/downloads.rs` |
570 | PoM checks | `MNW/pom/src/checks/scan_pipeline.rs` |
571 | Public transparency page | `MNW/server/site-docs/public/about/scanning.md` |
572
573 ## Appendix B: Operator rollout procedure
574
575 One-time prod setup for Phases 3a / 3d. Each step is independent; do them in
576 any order. After each, the corresponding Pipeline Health card on the admin
577 dashboard flips from down to ok within one upload cycle.
578
579 ### abuse.ch Auth-Key (Phase 3a)
580
581 1. Register at <https://auth.abuse.ch>. Free, single email confirmation.
582 2. Add to `/opt/makenotwork/.env`:
583 ```
584 ABUSE_CH_AUTH_KEY=<the-issued-key>
585 ```
586 3. `systemctl restart makenotwork`.
587 4. Watch the `malwarebazaar` and `urlhaus` cards flip after the next upload.
588
589 ### ClamAV daemon (Phase 3d)
590
591 Run as root on the Hetzner prod host (one-time):
592 ```
593 /opt/makenotwork/deploy/setup-clamav.sh
594 echo 'CLAMAV_SOCKET=/var/run/clamav/clamd.ctl' >> /opt/makenotwork/.env
595 systemctl restart makenotwork
596 ```
597 The script installs `clamav-daemon` + `clamav-freshclam`, waits for the
598 initial signature DB pull (up to 5 minutes), and verifies clamd is reachable
599 over its Unix socket. Signatures auto-update via `freshclam.service`.
600
601 ### YARA rules (Phase 3d)
602
603 ```
604 /opt/makenotwork/deploy/setup-yara-rules.sh
605 echo 'YARA_RULES_DIR=/opt/makenotwork/yara-rules' >> /opt/makenotwork/.env
606 systemctl restart makenotwork
607 ```
608 Pulls Florian Roth's `signature-base` (CC-BY-NC 4.0) shallow clone, flattens
609 the `.yar` files into `/opt/makenotwork/yara-rules`, installs a weekly cron
610 to refresh upstream. Stamps the active commit SHA at
611 `/opt/makenotwork/yara-rules/RULESET_VERSION` for audit-trail correlation.
612
613 A `systemctl restart makenotwork` is required to recompile rules after every
614 ruleset bump — the cron only refreshes the files on disk, not the in-memory
615 compiled rules.
616
617 ---
618
619 ## Appendix C: Migration plan for current held backlog
620
621 9 versions in `held_for_review` since 2026-05-10. Plan:
622
623 1. Implement Phase 1 + Phase 3.10 (MalwareBazaar fix with Auth-Key).
624 2. Trigger `Bulk Re-scan` on all 9 from the admin dashboard once the new
625 pipeline is running.
626 3. Expected outcome: 9 → Clean. They're our own builds and would have passed
627 the original pipeline if MB hadn't errored.
628 4. If any flag under the new (richer) pipeline, investigate as a normal
629 held-review case.
630
631 No `UPDATE versions SET scan_status='clean'` shortcut. The whole point of the
632 redesign is that the system promotes a file when it has reason to, not
633 because an admin reached around it.
634