Skip to main content

max / makenotwork

server: drop scan permit across S3 download scan_semaphore exists to bound clamd CPU + RSS, not S3 IO. Holding the permit across download_object serialized network fetches at SCAN_MAX_CONCURRENT and meant any scan backlog stalled queued workers on permit acquisition while the DB pool starved. Moves the GET out of the guard. Permit now covers only the Pipeline::scan call. Worker count remains the operational cap on concurrent downloads (SCAN_WORKER_COUNT=2, SCAN_MAX_CONCURRENT=4 today, so concurrent download behavior is unchanged at current settings — the fix is structural for the moment the pool is widened). No layer reads global state, so narrowing the permit can't expose a race. Verdicts are byte-identical. Plan: _private/docs/mnw/server-docs/plans/scanner-pool-permit.md.
Author: Max J. <87768334+MaxJMath@users.noreply.github.com> · 2026-05-27 14:31 UTC
Commit: 5c2f7b251f21158ace82e606084fe09d4890c4d0
Parent: c4113a7
1 file changed, +7 insertions, -3 deletions
@@ -168,11 +168,15 @@ async fn run_pipeline_and_decide(
168 168 return Ok(FileScanStatus::HeldForReview);
169 169 }
170 170
171 - // Permit + S3 download + scan. Permit drops before the tail ops to avoid
172 - // serializing on small post-scan work.
171 + // S3 download runs *outside* the scan_semaphore. The permit bounds the
172 + // CPU/clamd-heavy scan phase, not network IO — holding it across the GET
173 + // serializes downloads at SCAN_MAX_CONCURRENT and lets a scan backlog
174 + // starve the DB pool. Worker count is the operational cap on concurrent
175 + // downloads.
176 + let data = ctx.s3.download_object(&job.s3_key).await?;
177 +
173 178 let result: ScanResult = {
174 179 let _permit = ctx.scan_semaphore.acquire().await?;
175 - let data = ctx.s3.download_object(&job.s3_key).await?;
176 180 Arc::clone(&ctx.pipeline).scan(data, file_type).await
177 181 };
178 182