| 1 |
# Scan Pipeline Audit and Redesign |
| 2 |
|
| 3 |
Status: Draft, 2026-05-24. Author: Max + Claude. Not yet implemented. |
| 4 |
|
| 5 |
This document is the design predicate for a full redesign of MNW's file-scanning |
| 6 |
pipeline. It is the result of discovering that every upload on the platform since |
| 7 |
2026-05-10 has been silently held at `held_for_review` because the |
| 8 |
MalwareBazaar layer fails closed on a missing API key — and there was no |
| 9 |
admin-visible signal that anything was wrong. |
| 10 |
|
| 11 |
The shipped pipeline is structurally fragile in ways that are not unique to that |
| 12 |
one bug. This document re-derives what the pipeline should look like, picks the |
| 13 |
layer set, fixes the fail-closed-by-default policy, moves scanning off the |
| 14 |
upload critical path, and specifies an admin surface that would have caught the |
| 15 |
MalwareBazaar regression on day one. |
| 16 |
|
| 17 |
No code changes accompany this document. Implementation is sequenced separately |
| 18 |
once the design is reviewed. |
| 19 |
|
| 20 |
--- |
| 21 |
|
| 22 |
## 1. Goals and non-goals |
| 23 |
|
| 24 |
**Goals** |
| 25 |
|
| 26 |
- A scan pipeline a one-person platform can defend in public and point to as a |
| 27 |
trust differentiator. |
| 28 |
- Multi-signal detection that survives the loss of any single layer. |
| 29 |
- No silent platform-wide outage when a third-party scan endpoint changes |
| 30 |
behavior. |
| 31 |
- Honest, transparent communication to creators and downloaders about what is |
| 32 |
scanned and what is found. |
| 33 |
- Zero per-call vendor lock-in to a hyperscaler-owned threat-intel platform. |
| 34 |
|
| 35 |
**Non-goals** |
| 36 |
|
| 37 |
- Best-in-class detection of nation-state malware. That bar belongs to enterprise |
| 38 |
SOCs with seven-figure budgets. We aim for "covers ~90% of real-world malware |
| 39 |
on a creator-uploaded distribution platform" — explicitly bounded. |
| 40 |
- Real-time dynamic-analysis sandboxing for every upload. Cost-prohibitive at |
| 41 |
this stage; reserved for manual review of flagged samples. |
| 42 |
- Detection of supply-chain compromise inside dependencies of an uploaded |
| 43 |
binary. Out of scope for a file scanner; addressed separately by |
| 44 |
reproducible-build verification and creator-attested provenance, future work. |
| 45 |
|
| 46 |
## 2. Threat model |
| 47 |
|
| 48 |
Adversaries, in priority order of damage to the platform: |
| 49 |
|
| 50 |
1. **Compromised creator account pushing a poisoned update.** |
| 51 |
A long-standing creator's account is taken over. Attacker uploads a new |
| 52 |
version of an existing app with malware bundled in. Existing customers |
| 53 |
auto-update or download a "trusted" creator's release. Highest reputational |
| 54 |
damage because creator trust is the platform's load-bearing asset. |
| 55 |
2. **Malicious-actor signup uploading malware as a free or cheap "tool".** |
| 56 |
Attacker creates a fresh account, lists a fake app, hopes a few downloads |
| 57 |
happen before takedown. Lower individual blast radius but easier to attempt |
| 58 |
at volume. |
| 59 |
3. **Legitimate creator uploads benign software that triggers a false positive.** |
| 60 |
Crypto wallets, system utilities, low-level audio tools, and AppImages all |
| 61 |
routinely false-positive on signature AVs. Creator-facing UX must handle |
| 62 |
this gracefully — visible status, clear appeal path, no silent quarantine. |
| 63 |
4. **Account-takeover update path** (subset of (1)): even if the new binary is |
| 64 |
signed with the creator's existing Apple Dev ID or Authenticode cert, an |
| 65 |
attacker with cert access can re-sign. Defense relies on out-of-band |
| 66 |
signals: new-device login, IP reputation, version-velocity anomalies, |
| 67 |
creator email confirmation on new releases. |
| 68 |
5. **Novel zero-day** that no engine in our stack recognizes. |
| 69 |
Unavoidable in the general case. Mitigation: hash-reputation comparison |
| 70 |
over time (re-scan), public scan-result transparency so other downloaders |
| 71 |
can flag, and an admin review queue that surfaces anything the auto-stack |
| 72 |
can't classify. |
| 73 |
6. **Embedded malicious URLs in otherwise benign documentation, license files, |
| 74 |
or app text.** Lower priority but cheap to defend via URL reputation |
| 75 |
lookups on extracted strings. |
| 76 |
|
| 77 |
The dominant cases are (1) and (2). (3) is the dominant *user-experience* |
| 78 |
problem we'll cause on ourselves if we get fail-closed policy wrong. |
| 79 |
|
| 80 |
## 3. Current pipeline |
| 81 |
|
| 82 |
### 3.1 Layers as implemented |
| 83 |
|
| 84 |
`MNW/server/src/scanning/` runs six layers per upload (`scanning/mod.rs`): |
| 85 |
|
| 86 |
|
| 87 |
|
| 88 |
| 1 | `content_type` | Magic-byte sniffing of file header | Pass / Fail | |
| 89 |
| 2 | `structural` | Format-specific parser (PE, ELF, Mach-O, etc.) | Pass / Skip | |
| 90 |
| 3 | `archive` | ZIP / tar walk for nested malware | Pass / Skip | |
| 91 |
| 4 | `yara` | `yara-x` rule engine | Skip if no rules loaded | |
| 92 |
| 5 | `clamav` | `clamd` socket over `INSTREAM` | Skip if no socket configured | |
| 93 |
| 6 | `malwarebazaar` | abuse.ch hash lookup HTTP API | **Error if API shape unexpected** | |
| 94 |
|
| 95 |
Final disposition (`scanning/mod.rs::ScanPipeline::scan`): |
| 96 |
|
| 97 |
- Any layer `Fail` → `Quarantined` |
| 98 |
- Any layer `Error` → `HeldForReview` (fail closed) |
| 99 |
- Otherwise → `Clean` |
| 100 |
|
| 101 |
### 3.2 Gaps observed |
| 102 |
|
| 103 |
|
| 104 |
|
| 105 |
| ClamAV daemon not installed on prod | Layer always `Skip`; baseline AV signal absent | |
| 106 |
| YARA rules directory empty on prod | Layer always `Skip`; no custom signatures | |
| 107 |
| MalwareBazaar response shape changed (now requires `Auth-Key` header) | Layer returns `Error` on every call; pipeline returns `HeldForReview` on every upload | |
| 108 |
| No code-signing verification (Apple notarization, Authenticode, AppImage GPG) | Largest available *positive* trust signal entirely unused | |
| 109 |
| Synchronous scan on upload request handler | Slow third-party API stalls the upload thread; one stuck third party blocks every concurrent upload | |
| 110 |
| Fail-closed-by-default for `Error` verdicts on optional layers | Optional best-effort layers can take down the whole pipeline (this is exactly what happened) | |
| 111 |
| No admin surface for layer-health monitoring | Two-week silent regression with no alert; only noticed when downloads broke | |
| 112 |
| No admin queue UI beyond a single "held items" list | No bulk re-scan, no per-layer detail, no history, no audit log | |
| 113 |
| No rescan capability | Held files can't be re-evaluated after a layer is fixed; only path is `UPDATE versions SET scan_status='clean'` | |
| 114 |
| Scan results stored per-`s3_key` not per-`version_id` | Detail joins go through `s3_key`, breaks if a file is referenced by multiple versions or moved | |
| 115 |
|
| 116 |
### 3.3 Gap that triggered this audit |
| 117 |
|
| 118 |
Every upload since 2026-05-10 sits at `held_for_review`. Each `file_scan_results.scan_layers` row shows the same final entry: |
| 119 |
|
| 120 |
``` |
| 121 |
{"layer":"malwarebazaar","verdict":"error","detail":"Unexpected query_status: unknown"} |
| 122 |
``` |
| 123 |
|
| 124 |
The MalwareBazaar `get_info` endpoint changed: unauthenticated requests no |
| 125 |
longer return a `query_status` field. Our parser defaults the missing field to |
| 126 |
the literal string `"unknown"`, which falls through the match arm and returns |
| 127 |
`Error`. The fail-closed policy then converts that into `HeldForReview` for |
| 128 |
every upload. |
| 129 |
|
| 130 |
No alert fired. The only signal was a user trying to download a GO build and |
| 131 |
getting a generic "Failed to get download URL" toast. |
| 132 |
|
| 133 |
## 4. Target architecture |
| 134 |
|
| 135 |
Three structural shifts: |
| 136 |
|
| 137 |
1. **Async scan, sync upload.** Upload returns immediately with `Pending` |
| 138 |
status. Scan runs in a background worker. Status flips when scan completes. |
| 139 |
Downloads gate on terminal status (`Clean`, `Quarantined`). |
| 140 |
2. **Explicit per-layer fail policy.** Each layer declares at registration |
| 141 |
whether `Error` is fail-open (Skip-equivalent) or fail-closed |
| 142 |
(`HeldForReview`). No global has-error switch. |
| 143 |
3. **Multi-signal detection with positive trust signals.** Code-signing and |
| 144 |
notarization checks contribute *evidence of trust*, not just absence of |
| 145 |
threats. A properly notarized macOS binary from a verified Dev ID team is |
| 146 |
strong positive evidence; an unsigned `.exe` is weaker baseline. Today |
| 147 |
neither signal exists. |
| 148 |
|
| 149 |
### 4.1 Layer set (post-audit) |
| 150 |
|
| 151 |
|
| 152 |
|
| 153 |
| 1 | `content_type` | Magic-byte sniffing, in-process | fail-closed (cheap, deterministic) | |
| 154 |
| 2 | `structural` | Format parsers, in-process | fail-closed (cheap, deterministic) | |
| 155 |
| 3 | `archive` | Nested-archive walk, in-process | fail-closed (cheap, deterministic) | |
| 156 |
| 4 | `yara` | yara-x + Florian Roth `signature-base` ruleset | fail-closed (in-process, deterministic) | |
| 157 |
| 5 | `clamav` | `clamd` daemon + `freshclam` cron | fail-open (network-dependent local service) | |
| 158 |
| 6 | `signing_macos` | `codesign --verify --deep --strict` + `spctl --assess --type exec` + notarization staple check | fail-open on macOS-only files; positive evidence if pass | |
| 159 |
| 7 | `signing_windows` | `signtool verify /pa` + cert chain inspection | fail-open on Windows-only files; positive evidence if pass | |
| 160 |
| 8 | `signing_linux` | AppImage GPG signature + zsync URL presence; deb/rpm signatures | fail-open; positive evidence if pass | |
| 161 |
| 9 | `abuse_malwarebazaar` | abuse.ch hash lookup with `Auth-Key` header | fail-open (third-party network) | |
| 162 |
| 10 | `abuse_urlhaus` | URL reputation on strings extracted from binary | fail-open (third-party network) | |
| 163 |
| 11 | `metadefender_cloud` | OPSWAT free tier (40/day), **second-opinion only** on YARA/ClamAV flags | fail-open (rate-limited, optional) | |
| 164 |
| 12 | `hybrid_analysis` | CrowdStrike Falcon Sandbox free key (30/month), **admin-triggered only** | manual, not on the auto-path | |
| 165 |
|
| 166 |
**Policy:** layers 1-4 are deterministic in-process work and fail closed |
| 167 |
because an internal bug producing `Error` is a code defect, not an outage. |
| 168 |
Layers 5+ are network or local-service dependencies and fail open because |
| 169 |
external regressions must never take down the platform — they degrade signal |
| 170 |
quality, not availability. |
| 171 |
|
| 172 |
### 4.2 Status machine |
| 173 |
|
| 174 |
``` |
| 175 |
┌──────────────────┐ |
| 176 |
upload accepted ───▶ Pending │ |
| 177 |
└────────┬─────────┘ |
| 178 |
│ scan worker picks up |
| 179 |
┌────────▼─────────┐ |
| 180 |
│ Scanning │ |
| 181 |
└────────┬─────────┘ |
| 182 |
┌──────────────┼──────────────┐ |
| 183 |
▼ ▼ ▼ |
| 184 |
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ |
| 185 |
│ Clean │ │HeldForReview│ │ Quarantined │ |
| 186 |
└─────────────┘ └──────┬──────┘ └─────────────┘ |
| 187 |
│ admin Promote |
| 188 |
▼ |
| 189 |
┌─────────────┐ |
| 190 |
│ Clean │ |
| 191 |
└─────────────┘ |
| 192 |
│ admin Quarantine |
| 193 |
▼ |
| 194 |
┌─────────────┐ |
| 195 |
│ Quarantined │ |
| 196 |
└─────────────┘ |
| 197 |
``` |
| 198 |
|
| 199 |
Transitions: |
| 200 |
|
| 201 |
|
| 202 |
|
| 203 |
| (no row) | Pending | Upload confirmed (S3 object created, row inserted) | |
| 204 |
| Pending | Scanning | Worker dequeues | |
| 205 |
| Scanning | Clean | All deterministic layers pass, no `Fail` in any layer | |
| 206 |
| Scanning | Quarantined | Any layer returns `Fail` | |
| 207 |
| Scanning | HeldForReview | Any fail-closed layer returns `Error`, or admin policy triggers (size cap, file type, creator new-account, etc.) | |
| 208 |
| HeldForReview | Clean | Admin promotes; audit-logged | |
| 209 |
| HeldForReview | Quarantined | Admin quarantines; audit-logged with note | |
| 210 |
| Clean | Scanning | Admin-triggered re-scan | |
| 211 |
| Quarantined | (no transition without DB-level intervention) | Quarantine is sticky by design | |
| 212 |
|
| 213 |
### 4.3 Re-scan cadence |
| 214 |
|
| 215 |
- **Trigger re-scan automatically** when: |
| 216 |
- YARA ruleset is updated (operator-controlled). |
| 217 |
- ClamAV `freshclam` rolls a new sig DB version. |
| 218 |
- An admin explicitly clicks Re-scan. |
| 219 |
- **Background sweep**: every 30 days, re-run hash-lookup layers (abuse.ch, |
| 220 |
optionally MetaDefender) across the full `Clean` corpus. Detects sigs that |
| 221 |
have *become* known-bad over time. Quarantine on `Fail`, log to audit. |
| 222 |
|
| 223 |
### 4.4 Async architecture |
| 224 |
|
| 225 |
Move scan off the upload critical path: |
| 226 |
|
| 227 |
``` |
| 228 |
POST /api/versions/{id}/upload/confirm |
| 229 |
→ create version row (scan_status=Pending) |
| 230 |
→ enqueue scan_job |
| 231 |
→ return 200 with status="pending" |
| 232 |
→ client polls or subscribes via SSE |
| 233 |
|
| 234 |
scan_worker (tokio task in same process) |
| 235 |
→ SELECT FOR UPDATE SKIP LOCKED jobs WHERE status='pending' |
| 236 |
→ run pipeline |
| 237 |
→ INSERT INTO file_scan_results (per-layer JSON) |
| 238 |
→ UPDATE versions.scan_status |
| 239 |
→ publish status-changed event (SSE channel for admin + creator) |
| 240 |
``` |
| 241 |
|
| 242 |
Job table: |
| 243 |
|
| 244 |
```sql |
| 245 |
CREATE TABLE scan_jobs ( |
| 246 |
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), |
| 247 |
target_kind TEXT NOT NULL CHECK (target_kind IN ('version', 'item_cover', 'item_attachment')), |
| 248 |
target_id UUID NOT NULL, |
| 249 |
s3_key TEXT NOT NULL, |
| 250 |
status TEXT NOT NULL CHECK (status IN ('queued', 'running', 'done', 'failed')), |
| 251 |
attempts INT NOT NULL DEFAULT 0, |
| 252 |
enqueued_at TIMESTAMPTZ NOT NULL DEFAULT now(), |
| 253 |
started_at TIMESTAMPTZ, |
| 254 |
completed_at TIMESTAMPTZ, |
| 255 |
last_error TEXT |
| 256 |
); |
| 257 |
CREATE INDEX scan_jobs_status_enqueued ON scan_jobs (status, enqueued_at); |
| 258 |
``` |
| 259 |
|
| 260 |
Worker pool: N workers (configurable, default 2), each pulling with |
| 261 |
`SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1`. Job retry policy: 3 attempts on |
| 262 |
transient failure, then `status='failed'`, surface in admin dashboard. |
| 263 |
|
| 264 |
### 4.5 Creator-visible UX |
| 265 |
|
| 266 |
In the creator dashboard, each version row gains a scan-status badge: |
| 267 |
|
| 268 |
- **Pending** — neutral, "Scanning…" |
| 269 |
- **Clean** — positive, "Cleared" |
| 270 |
- **HeldForReview** — warning, "Awaiting review (typically under 24h)" |
| 271 |
- **Quarantined** — negative, "Quarantined — [contact support] to appeal" |
| 272 |
|
| 273 |
Held / quarantined rows expand to show per-layer detail: which layer flagged, |
| 274 |
what it said, what the creator can do. Honest, transparent, brand-aligned. |
| 275 |
|
| 276 |
Public-facing download buttons: |
| 277 |
|
| 278 |
- Hidden entirely while `Pending` or `Scanning`. |
| 279 |
- Visible on `Clean`. |
| 280 |
- Hidden on `HeldForReview` / `Quarantined` (creator sees a note in their |
| 281 |
dashboard; public sees nothing — graceful degradation). |
| 282 |
|
| 283 |
## 5. /admin/uploads dashboard |
| 284 |
|
| 285 |
The audit surface. Existing route at `routes/admin/uploads.rs` is the seed; we |
| 286 |
extend it. |
| 287 |
|
| 288 |
### 5.1 Page layout |
| 289 |
|
| 290 |
``` |
| 291 |
┌─────────────────────────────────────────────────────────────────────┐ |
| 292 |
│ Pipeline Health (last 24h / 7d toggle) │ |
| 293 |
│ ┌─────────────────┬──────────────┬──────────────┬─────────────────┐ │ |
| 294 |
│ │ content_type │ 100% 1ms │ 100% 1ms │ ✓ last: now │ │ |
| 295 |
│ │ structural │ 100% 2ms │ 100% 2ms │ ✓ last: now │ │ |
| 296 |
│ │ yara │ 99% 45ms │ 99% 43ms │ ✓ last: 12s ago │ │ |
| 297 |
│ │ clamav │ 0% — │ 0% — │ ✗ not running │ │ |
| 298 |
│ │ malwarebazaar │ 0% — │ 0% — │ ✗ 14 days ago │ │ |
| 299 |
│ │ signing_macos │ 98% 320ms │ 97% 350ms │ ✓ last: 1m ago │ │ |
| 300 |
│ └─────────────────┴──────────────┴──────────────┴─────────────────┘ │ |
| 301 |
└─────────────────────────────────────────────────────────────────────┘ |
| 302 |
|
| 303 |
┌─────────────────────────────────────────────────────────────────────┐ |
| 304 |
│ Active Queue (Pending + Scanning) [auto-refresh on] │ |
| 305 |
│ │ |
| 306 |
│ 3 files scanning, 0 stuck │ |
| 307 |
│ • GoingsOn_0.4.0_aarch64.dmg — scanning (4s) │ |
| 308 |
│ • SamplePack.zip — pending (queued 1s ago) │ |
| 309 |
└─────────────────────────────────────────────────────────────────────┘ |
| 310 |
|
| 311 |
┌─────────────────────────────────────────────────────────────────────┐ |
| 312 |
│ Held for Review 9 items │ |
| 313 |
│ │ |
| 314 |
│ [+] GoingsOn_0.3.1_x64-setup.exe │ |
| 315 |
│ creator: max • item: GoingsOn Desktop • 14 days held │ |
| 316 |
│ layers: ✓ct ✓struct -arch -yara -clam ⚠mb │ |
| 317 |
│ [Promote] [Quarantine] [Re-scan] │ |
| 318 |
│ │ |
| 319 |
│ [+] GoingsOn_0.3.1_amd64.AppImage ... │ |
| 320 |
└─────────────────────────────────────────────────────────────────────┘ |
| 321 |
|
| 322 |
┌─────────────────────────────────────────────────────────────────────┐ |
| 323 |
│ Recent History (last 30d) [▶ expand] [filter ▾] │ |
| 324 |
└─────────────────────────────────────────────────────────────────────┘ |
| 325 |
``` |
| 326 |
|
| 327 |
### 5.2 Section spec |
| 328 |
|
| 329 |
**Pipeline Health (top panel)** |
| 330 |
- Per-layer rolling stats over the last 24h and 7d. |
| 331 |
- Columns: layer name, success rate, error rate, p50 latency, p95 latency, |
| 332 |
health badge, last successful response timestamp. |
| 333 |
- Health badge logic: `✓` if last successful response < 1h ago AND error rate < |
| 334 |
10%; `⚠` if either degraded; `✗` if no successful response in 24h or error |
| 335 |
rate > 50%. |
| 336 |
- Click any row → drill-down to last 100 layer invocations with full |
| 337 |
per-call detail. Useful for diagnosing intermittent regressions. |
| 338 |
|
| 339 |
**Active Queue (Pending + Scanning)** |
| 340 |
- Auto-refreshes via HTMX SSE. |
| 341 |
- Shows count of files Pending vs Scanning. |
| 342 |
- "Stuck" detection: anything in `Scanning` for > 5 minutes is flagged red. |
| 343 |
- One-line entry per file: filename, current state, elapsed time. No |
| 344 |
per-layer detail at this stage (scan not done yet). |
| 345 |
|
| 346 |
**Held for Review** |
| 347 |
- Default expanded — these need decisions. |
| 348 |
- One row per held version. Per row: |
| 349 |
- Creator handle + item title + version filename + size + age-of-hold. |
| 350 |
- Layer chips: small colored squares, one per layer, showing verdict |
| 351 |
(`pass`, `skip`, `fail`, `error`, `pending`). |
| 352 |
- Click any chip → in-place expand to the layer's `detail` JSON. |
| 353 |
- Three actions: **Promote** (with optional note), **Quarantine** (note |
| 354 |
required), **Re-scan** (re-runs pipeline now). |
| 355 |
- Bulk operations: |
| 356 |
- Select N rows → **Bulk Re-scan** (no note required; common after a layer |
| 357 |
fix lands). |
| 358 |
- Select N rows → **Bulk Promote** (single shared note required, hard |
| 359 |
audit-logged). |
| 360 |
- No bulk Quarantine — every quarantine is an individual decision. |
| 361 |
- Sort: default by held-at ascending (oldest first); switchable. |
| 362 |
- Filter: by app, by creator, by which layer flagged, by file type. |
| 363 |
|
| 364 |
**Recent History (collapsible)** |
| 365 |
- Default collapsed. When expanded: dense grid of last 30d, all statuses. |
| 366 |
- Columns: creator, item, version, status, scanned-at, layer-summary chip |
| 367 |
strip, action. |
| 368 |
- Filter: by status, app, creator, date range. |
| 369 |
- Pagination at 100 rows; further history at `/admin/uploads/archive`. |
| 370 |
- Each row click → same expandable per-layer detail panel. |
| 371 |
|
| 372 |
**Audit Trail** |
| 373 |
- New table: |
| 374 |
```sql |
| 375 |
CREATE TABLE scan_admin_actions ( |
| 376 |
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), |
| 377 |
version_id UUID, |
| 378 |
item_id UUID, |
| 379 |
admin_id UUID NOT NULL, |
| 380 |
action TEXT NOT NULL CHECK (action IN ('promote', 'quarantine', 'rescan', 'bulk_promote', 'bulk_rescan')), |
| 381 |
prev_status TEXT, |
| 382 |
new_status TEXT, |
| 383 |
note TEXT, |
| 384 |
created_at TIMESTAMPTZ NOT NULL DEFAULT now() |
| 385 |
); |
| 386 |
``` |
| 387 |
- Inline tooltip on each row: "Last action: promoted by max, 2 days ago". |
| 388 |
- Full log at `/admin/uploads/audit` with filters. |
| 389 |
|
| 390 |
### 5.3 Access control |
| 391 |
|
| 392 |
- Gated on existing `AdminUser` extractor (PLATFORM_ADMIN_ID single-user |
| 393 |
model). No per-forum-style role splitting. |
| 394 |
- All POST routes CSRF-protected (existing middleware). |
| 395 |
- All admin actions audit-logged (table above). |
| 396 |
|
| 397 |
### 5.4 Live updates |
| 398 |
|
| 399 |
- HTMX SSE channel `/admin/uploads/events` pushing: |
| 400 |
- `scan-started`, `scan-completed`, `scan-stuck` events. |
| 401 |
- Active Queue + Pipeline Health update without page reload. |
| 402 |
- History grid stays static; reloads only on filter change. |
| 403 |
|
| 404 |
## 6. Monitoring and alerting |
| 405 |
|
| 406 |
Add PoM checks for the scan pipeline: |
| 407 |
|
| 408 |
|
| 409 |
|
| 410 |
| Per-layer error rate (1h window) | > 10% | Notify admin (email + dashboard banner) | |
| 411 |
| Per-layer success count (24h) | == 0 | Page admin: layer fully down | |
| 412 |
| Queue depth | > 50 pending | Notify: workers falling behind | |
| 413 |
| Stuck-scan count (Scanning > 5min) | > 5 | Notify: stuck workers | |
| 414 |
| Held-for-review count | > 100 | Notify: review backlog growing | |
| 415 |
|
| 416 |
PoM module: `pom/src/checks/scan_pipeline.rs`, queries the same `/admin/uploads` |
| 417 |
data endpoints (or hits the DB directly via the existing PoM SSH pattern). |
| 418 |
|
| 419 |
## 7. Public transparency |
| 420 |
|
| 421 |
Public-facing page at `/about/scanning` (DocEngine markdown). Contents: |
| 422 |
|
| 423 |
- What we scan with: each layer named, linked to its docs. |
| 424 |
- What status each can produce. |
| 425 |
- What "Clean" actually means and what it doesn't. |
| 426 |
- Aggregate stats (auto-substituted via assumptions/derived values): |
| 427 |
- "X% of uploads cleared automatically in <2min, last 30 days." |
| 428 |
- "Y files manually reviewed, last 30 days." |
| 429 |
- "Z files quarantined, last 30 days." |
| 430 |
- Creator appeals process for false positives. |
| 431 |
|
| 432 |
Per-version public scan-result panel on the public item page: |
| 433 |
|
| 434 |
- "This version was scanned on [date]. Cleared by N layers." |
| 435 |
- No per-layer detail surfaced publicly (avoids handing attackers a layer-by- |
| 436 |
layer evasion roadmap), but the existence of multi-layer scanning is visible. |
| 437 |
|
| 438 |
Brand alignment: this is the kind of unforced transparency that almost no |
| 439 |
distribution platform does. Differentiation surface, not just an |
| 440 |
implementation detail. |
| 441 |
|
| 442 |
## 8. Sequencing |
| 443 |
|
| 444 |
Implementation order, sized for sequential landing: |
| 445 |
|
| 446 |
### Phase 1 — Architectural floor |
| 447 |
|
| 448 |
1. `scan_jobs` table + worker pool + status machine. |
| 449 |
2. Move scan off the upload request handler. |
| 450 |
3. Add `Pending`, `Scanning` to `FileScanStatus` enum. |
| 451 |
4. Per-layer fail policy declared at registration; remove global has-error |
| 452 |
switch. |
| 453 |
5. Tests: pipeline runs async, status flips correctly, fail-open vs |
| 454 |
fail-closed honored per layer. |
| 455 |
|
| 456 |
### Phase 2 — Admin surface |
| 457 |
|
| 458 |
6. Extend `/admin/uploads` to the three-section layout + health panel. |
| 459 |
7. `scan_admin_actions` table + audit logging on every admin action. |
| 460 |
8. HTMX SSE for live queue + health updates. |
| 461 |
9. Per-layer detail expansion, bulk re-scan, bulk promote. |
| 462 |
|
| 463 |
### Phase 3 — Layer set |
| 464 |
|
| 465 |
10. Fix MalwareBazaar — register `Auth-Key`, add header, parse new response |
| 466 |
shape, tests against captured fixtures. |
| 467 |
11. Install ClamAV daemon on prod + `freshclam` cron in deploy script. Wire |
| 468 |
`CLAMAV_SOCKET` env. |
| 469 |
12. Pull Florian Roth `signature-base` YARA rules into prod; wire |
| 470 |
`YARA_RULES_DIR`. Add ruleset-version field to scan results so we can |
| 471 |
correlate. |
| 472 |
13. URLhaus layer (string extraction + URL reputation). |
| 473 |
14. Signing-trust layers (macOS, Windows, AppImage). These need helper |
| 474 |
binaries on prod (`codesign`, `spctl`, `signtool`, `gpg`) — vendor or |
| 475 |
install. Plan vendoring carefully: `codesign` is macOS-only, so the |
| 476 |
Hetzner-Linux server can't run it. **Open question**: cross-platform |
| 477 |
Mach-O signature verification. Candidates: `apple-codesign` Rust crate |
| 478 |
(Gregory Szorc's `rcodesign`), which can verify Apple signatures on |
| 479 |
Linux. Verify before committing. |
| 480 |
15. MetaDefender Cloud free tier — second-opinion layer, triggered only when |
| 481 |
YARA or ClamAV flag a suspicion. |
| 482 |
|
| 483 |
### Phase 4 — Operations |
| 484 |
|
| 485 |
16. PoM scan-pipeline checks + alerting. |
| 486 |
17. Re-scan sweeps (admin-triggered + monthly background). |
| 487 |
18. Public `/about/scanning` page + per-version public scan panel. |
| 488 |
19. Held-file backlog: re-scan the existing 9 held versions under the new |
| 489 |
pipeline. If they come out Clean (expected; they're our own builds), |
| 490 |
they flip automatically. If anything flags, we investigate manually. |
| 491 |
|
| 492 |
### Phase 5 — Reserved for future |
| 493 |
|
| 494 |
- Hybrid Analysis sandbox detonation on admin-flagged samples. |
| 495 |
- Hash-reputation pass: same SHA shipped Clean by trusted creator = |
| 496 |
fast-pass on a re-upload by the same creator. Cross-creator fast-pass is |
| 497 |
not safe (a malicious creator could pre-clear a hash) and is out of scope. |
| 498 |
- Creator-attested provenance: SLSA-style supply-chain attestations, |
| 499 |
reproducible-build verification. Long-term, separate document. |
| 500 |
|
| 501 |
## 9. What we explicitly chose not to do |
| 502 |
|
| 503 |
- **VirusTotal / Google Threat Intelligence**. Free tier ToS forbids |
| 504 |
commercial workflow use. Paid tier ($20–50K/yr floor) is GCP-locked |
| 505 |
vendor with hostile migration behavior (prepaid credits voided in |
| 506 |
GTI migration). Misaligned with platform brand. Future revisit only if |
| 507 |
upload volume passes ~1K/day **and** the current stack misses a real |
| 508 |
incident. |
| 509 |
- **Synchronous-scan retention**. Even with all layers fixed, holding the |
| 510 |
upload thread on third-party calls is structurally fragile. |
| 511 |
- **Per-layer-error fail-closed by default**. The single decision that |
| 512 |
caused this audit. Reversed. |
| 513 |
- **Global YARA-ruleset auto-update from upstream**. Roth's `signature-base` |
| 514 |
is curated but YARA rules can have false positives; we pin a ruleset |
| 515 |
version and bump deliberately, with re-scan of recent uploads on bump. |
| 516 |
|
| 517 |
## 10. Open questions |
| 518 |
|
| 519 |
- **macOS signature verification from Linux**. `rcodesign` looks viable but |
| 520 |
unverified. If not, a separate scan worker on a macOS host (or accepting |
| 521 |
signing-status only when the creator's upload tool self-reports it, |
| 522 |
cross-checked against the embedded signature blob) is the fallback. |
| 523 |
- **Where does `scan_jobs` live?** Same Postgres or a dedicated queue |
| 524 |
(Redis, RabbitMQ)? Default: Postgres + `SKIP LOCKED`, no new infra. Revisit |
| 525 |
if queue depth + worker latency demand it. Probably never. |
| 526 |
- **Bulk-promote audit threshold.** Should bulk-promote require dual-control |
| 527 |
(a second admin's approval) above N rows? Today single-operator, so the |
| 528 |
question is partly theoretical, but it shapes the schema. |
| 529 |
- **Public scan-result panel detail level.** Per-layer breakdown helps |
| 530 |
honest creators see what we evaluated; helps attackers fingerprint our |
| 531 |
pipeline. Default: aggregate verdict only, with the layer list named but |
| 532 |
not per-file verdicts. Decide on first iteration. |
| 533 |
|
| 534 |
## 11. Cost summary |
| 535 |
|
| 536 |
|
| 537 |
|
| 538 |
| abuse.ch (MalwareBazaar / URLhaus / ThreatFox) | $0 with free Auth-Key | |
| 539 |
| ClamAV + `freshclam` | $0 | |
| 540 |
| YARA + Roth `signature-base` ruleset | $0 | |
| 541 |
| Apple notarization staple verify (`rcodesign`) | $0 | |
| 542 |
| Authenticode signature verify | $0 | |
| 543 |
| AppImage GPG signature verify | $0 | |
| 544 |
| MetaDefender Cloud free tier (40/day) | $0 | |
| 545 |
| Hybrid Analysis free key (30/month) | $0 | |
| 546 |
| PoM monitoring | $0 (existing infra) | |
| 547 |
| **Total recurring cost** | **$0** | |
| 548 |
|
| 549 |
If upload volume + incident pressure justify it later: MetaDefender paid |
| 550 |
(~$5–15K/yr estimated, commercial-use-licensed, no GCP lock-in) before any |
| 551 |
consideration of VT/GTI. |
| 552 |
|
| 553 |
--- |
| 554 |
|
| 555 |
## Appendix A: File and module touchpoints |
| 556 |
|
| 557 |
|
| 558 |
|
| 559 |
| Pipeline orchestration | `MNW/server/src/scanning/mod.rs` | |
| 560 |
| Per-layer impls | `MNW/server/src/scanning/{yara,clamav,hash_lookup,...}.rs` | |
| 561 |
| New: signing layers | `MNW/server/src/scanning/signing/{macos,windows,linux}.rs` | |
| 562 |
| New: URLhaus layer | `MNW/server/src/scanning/urlhaus.rs` | |
| 563 |
| Status enum | `MNW/server/src/db/enums.rs` (`FileScanStatus`) | |
| 564 |
| Versions / scan-status columns | `MNW/server/src/db/scanning.rs`, `migrations/004_file_scan_status.sql` | |
| 565 |
| New: `scan_jobs` worker | `MNW/server/src/scanning/worker.rs` | |
| 566 |
| New: `scan_admin_actions` audit log | `MNW/server/src/db/scan_admin_actions.rs` | |
| 567 |
| Admin dashboard route | `MNW/server/src/routes/admin/uploads.rs` | |
| 568 |
| Admin dashboard templates | `MNW/server/templates/pages/admin/uploads*.html` | |
| 569 |
| Download gate | `MNW/server/src/routes/storage/downloads.rs` | |
| 570 |
| PoM checks | `MNW/pom/src/checks/scan_pipeline.rs` | |
| 571 |
| Public transparency page | `MNW/server/site-docs/public/about/scanning.md` | |
| 572 |
|
| 573 |
## Appendix B: Operator rollout procedure |
| 574 |
|
| 575 |
One-time prod setup for Phases 3a / 3d. Each step is independent; do them in |
| 576 |
any order. After each, the corresponding Pipeline Health card on the admin |
| 577 |
dashboard flips from down to ok within one upload cycle. |
| 578 |
|
| 579 |
### abuse.ch Auth-Key (Phase 3a) |
| 580 |
|
| 581 |
1. Register at <https://auth.abuse.ch>. Free, single email confirmation. |
| 582 |
2. Add to `/opt/makenotwork/.env`: |
| 583 |
``` |
| 584 |
ABUSE_CH_AUTH_KEY=<the-issued-key> |
| 585 |
``` |
| 586 |
3. `systemctl restart makenotwork`. |
| 587 |
4. Watch the `malwarebazaar` and `urlhaus` cards flip after the next upload. |
| 588 |
|
| 589 |
### ClamAV daemon (Phase 3d) |
| 590 |
|
| 591 |
Run as root on the Hetzner prod host (one-time): |
| 592 |
``` |
| 593 |
/opt/makenotwork/deploy/setup-clamav.sh |
| 594 |
echo 'CLAMAV_SOCKET=/var/run/clamav/clamd.ctl' >> /opt/makenotwork/.env |
| 595 |
systemctl restart makenotwork |
| 596 |
``` |
| 597 |
The script installs `clamav-daemon` + `clamav-freshclam`, waits for the |
| 598 |
initial signature DB pull (up to 5 minutes), and verifies clamd is reachable |
| 599 |
over its Unix socket. Signatures auto-update via `freshclam.service`. |
| 600 |
|
| 601 |
### YARA rules (Phase 3d) |
| 602 |
|
| 603 |
``` |
| 604 |
/opt/makenotwork/deploy/setup-yara-rules.sh |
| 605 |
echo 'YARA_RULES_DIR=/opt/makenotwork/yara-rules' >> /opt/makenotwork/.env |
| 606 |
systemctl restart makenotwork |
| 607 |
``` |
| 608 |
Pulls Florian Roth's `signature-base` (CC-BY-NC 4.0) shallow clone, flattens |
| 609 |
the `.yar` files into `/opt/makenotwork/yara-rules`, installs a weekly cron |
| 610 |
to refresh upstream. Stamps the active commit SHA at |
| 611 |
`/opt/makenotwork/yara-rules/RULESET_VERSION` for audit-trail correlation. |
| 612 |
|
| 613 |
A `systemctl restart makenotwork` is required to recompile rules after every |
| 614 |
ruleset bump — the cron only refreshes the files on disk, not the in-memory |
| 615 |
compiled rules. |
| 616 |
|
| 617 |
--- |
| 618 |
|
| 619 |
## Appendix C: Migration plan for current held backlog |
| 620 |
|
| 621 |
9 versions in `held_for_review` since 2026-05-10. Plan: |
| 622 |
|
| 623 |
1. Implement Phase 1 + Phase 3.10 (MalwareBazaar fix with Auth-Key). |
| 624 |
2. Trigger `Bulk Re-scan` on all 9 from the admin dashboard once the new |
| 625 |
pipeline is running. |
| 626 |
3. Expected outcome: 9 → Clean. They're our own builds and would have passed |
| 627 |
the original pipeline if MB hadn't errored. |
| 628 |
4. If any flag under the new (richer) pipeline, investigate as a normal |
| 629 |
held-review case. |
| 630 |
|
| 631 |
No `UPDATE versions SET scan_status='clean'` shortcut. The whole point of the |
| 632 |
redesign is that the system promotes a file when it has reason to, not |
| 633 |
because an admin reached around it. |
| 634 |
|