Skip to main content

max / makenotwork

Lock launch scope for soft → public launch; add Cloudflare maturation phase Pin the public-launch blockers to manual testing + tester invites only. Verify Cache-Control on S3 uploads is already in place. Move CDN coverage of paid downloads and caddy-ask concurrency cap into a new post-launch "Stack Maturation: Cloudflare Lean-In" phase with triggers and an explicit "not doing" list. Explicitly defer DIY tier, Small Creator On-Ramp, forums, and Code Assessment items out of launch scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author: Max J. <87768334+MaxJMath@users.noreply.github.com> · 2026-05-16 19:46 UTC
Commit: 3b5d245e95fd2f86891bf329a844d917cc8c7762
Parent: d299798
1 file changed, +70 insertions, -6 deletions
@@ -1,23 +1,87 @@
1 1 # Makenotwork TODO
2 2
3 3 ## Status
4 - v0.5.20 deployed 2026-05-16. Audit grade A (Run 26). ~88K LOC, 1,447 lib tests, 0 warnings. Migration 115. Sprints 1-9 complete (see `todo_done.md`). Content seeded: AF 0.4.0 + GO 0.3.1 on discover page. **0.5.20: docengine assumption substitution wired into boot** — site-docs render values from `docs/internal/business/assumptions.toml` at startup; `guide/tiers.md` migrated as the first consumer.
4 + v0.5.22 deployed 2026-05-16. Audit grade A (Run 26). ~88K LOC, 1,504 lib tests, 0 warnings. Migration 115. Sprints 1-9 complete (see `todo_done.md`). Content seeded: AF 0.4.0 + GO 0.3.1 on discover page. **2026-05-16 sprint: docengine assumption substitution** — site-docs values rendered from `docs/internal/business/assumptions.toml` at startup (0.5.20), filter pipeline shipped with extensible `Filter` trait + 8 built-ins (0.5.21), marginal-cost model corrected to include weighted Stripe fee on creator subs (0.5.22). Migrated so far: `guide/tiers.md`, `about/pricing.md`, `about/guarantees.md`, `support/faq.md`, `guide/stripe.md`.
5 5
6 6 Human tasks in `human_todo.md`. Completed items in `todo_done.md`.
7 7
8 8 ---
9 9
10 - ## Next Steps (Soft Launch)
10 + ## Launch (Locked Scope, 2026-05-16)
11 11
12 - Priority order. See `human_todo.md` for the full manual testing feature map.
12 + Public-launch blockers. Everything else in this file is post-launch. Do not promote items into this section without explicit user decision.
13 13
14 - 1. **Manual testing** — walk through `human_todo.md` sign-off table on live server (Stripe checkout, license keys, promo codes, cart, SyncKit sync)
15 - - SyncKit: AF + GO sync tested and working on live server (2026-05-11). BB sync needs synckit.toml (API key pending).
16 - - Remaining: Stripe checkout e2e for all 3 apps, license key flow, promo codes
14 + 1. **Manual testing** — walk through `human_todo.md` sign-off table on live server. Remaining: Stripe checkout e2e for all 3 apps, license key flow, promo codes. (SyncKit AF + GO verified 2026-05-11; BB sync deferred.)
17 15 2. **Invite testers** — generate invite codes, send hand-written emails per `docs/internal/outreach/tiers.md`
18 16
17 + Verified during scoping (2026-05-16):
18 + - ✅ **Cache-Control on S3 uploads** — every presigned-PUT call site passes `CACHE_CONTROL_IMMUTABLE` (`public, max-age=31536000, immutable`). Storage backend wires it through. (`routes/storage/{images,media,versions,uploads}.rs`, `routes/api/internal/uploads.rs`)
19 + - ⚠ **CDN coverage of paid downloads** — paid downloads intentionally bypass CDN (presigned S3 → `fsn1.your-objectstorage.com`); only free content uses `cdn.makenot.work`. Not a launch blocker at soft-launch scale; moved to Cloudflare Maturation phase below.
20 + - ⚠ **caddy-ask rate limit** — per-IP rate limit exists via tower-governor (10 rps, burst 60); handler short-circuits via `domain_cache` before DB. Concurrency cap on cache-miss DB path missing but low marginal value pre-launch. Moved to Cloudflare Maturation phase.
21 +
22 + Explicitly **out of launch scope** (do not work on before public launch):
23 + - DIY tier (entire section below) — defer to post-launch quarter; 25+ items, unproven support model
24 + - Small Creator On-Ramp (referrals, charter pricing, earnings-funded sub)
25 + - Community forum — forums service launches separately
26 + - Stripe SDK migration (pinned API version works)
27 + - Custom Pages
28 + - All Trust Audit open items
29 + - Background / multipart / CLI bulk uploads
30 + - All Code Assessment blind spots (async-trait, instrument cardinality, mutation, proc macros, benches, typestate, alloc)
31 + - All other Infra/Scaling items below ("pre-1k creator," not pre-launch)
32 + - All Nitpick/Fuzz deferred items
33 + - All Global UX polish
34 +
35 + ---
36 +
19 37 ---
20 38
39 + ## Stack Maturation: Cloudflare Lean-In (Post-Launch Phase)
40 +
41 + The platform currently uses Cloudflare as a thin DNS+CDN layer for static and free-content paths. Hetzner carries all paid-content egress, ACME issuance, DDoS absorption, and edge logic. This phase is about pushing more of that work to Cloudflare where it's cheaper, faster, or more resilient than doing it ourselves.
42 +
43 + **Why now (post-launch, not pre-launch):** at soft-launch scale Hetzner egress and origin compute are fine. The break-even shifts as we approach ~1k creators / video usage normalizes / ACME abuse becomes a real target. Do these incrementally as triggers fire — don't refactor speculatively.
44 +
45 + **Decision principles:**
46 + - Only move to Cloudflare what's cheaper OR better at the edge. Don't migrate for novelty.
47 + - Keep origin code provider-neutral where reasonable (avoid CF-specific lock-in inside Rust).
48 + - Each item below should have a clear "trigger to start" — a metric or event that says it's time.
49 +
50 + ### Phase 1: Paid-content egress (biggest cost lever)
51 + - [ ] **CDN coverage of paid downloads.** Currently `routes/storage/downloads.rs::resolve_content_url` routes only free content through `cdn.makenot.work`; paid content uses presigned S3 URLs straight to `fsn1.your-objectstorage.com`. Memory: "biggest hidden cost lever at scale." Same pattern in `routes/ota.rs:409`, `routes/api/guest_checkout.rs:273`, `routes/pages/public/content/item.rs:{200,267,435}`.
52 + - Options to evaluate:
53 + - **Signed CDN URLs via Cloudflare** (HMAC token in URL, validated at edge via Worker before origin fetch). Keeps cache hit ratio high per content-addressed key. Likely the right answer.
54 + - **Cloudflare R2 migration.** Egress-free to Cloudflare. Bigger lift, vendor migration, but eliminates the egress line entirely. Evaluate vs Hetzner Object Storage cost at projected scale.
55 + - **Cloudflare Stream for video.** Adaptive bitrate, transcode, signed playback. Pairs with the existing "Media transcoding pipeline" post-launch item.
56 + - Trigger to start: Cloudflare cache hit ratio drops below ~80% on weekly review, OR Hetzner egress exceeds projection by 50%, OR first video creator joins Big Files/Everything tier.
57 +
58 + ### Phase 2: Edge logic
59 + - [ ] **caddy-ask concurrency cap + edge filtering.** Current per-IP rate limit (10 rps, burst 60) is fine for the threat model today. At scale, move first-pass domain validation to a Cloudflare Worker: known-bad TLDs, syntactic checks, cached verified-domain set. Origin only sees genuinely-novel requests. Adds a `tokio::sync::Semaphore` on the cache-miss DB path as a belt-and-suspenders measure.
60 + - Trigger: caddy-ask QPS > 100/sec sustained, OR a real abuse incident.
61 + - [ ] **WAF-tier rate limiting** for `/api/*` write paths. Tower-governor handles per-IP; Cloudflare can do per-account, per-region, geo-blocking. Move the coarse layer to CF, keep tower-governor as origin defense-in-depth.
62 + - [ ] **Turnstile on signup + guest checkout** instead of (or alongside) origin-side bot detection. Cheaper than Stripe Radar tuning for the abuse vectors Stripe Radar doesn't cover (account farming).
63 +
64 + ### Phase 3: Static + asset pipeline
65 + - [ ] **Audit `Cache-Control` on origin-served static assets** (templates, static/, embed JS once it exists). Confirm CF caches what should be cached, bypasses what shouldn't (HTMX partials, dashboard).
66 + - [ ] **Cloudflare Cache Rules for HTMX fragments** that are safe to cache per-user (e.g. public project read paths). Push read latency down without touching origin.
67 + - [ ] **`/source/` (git browser) bot policy via CF.** Source browsing is bot-magnet territory; let CF handle the AI-scraper baseline before origin sees the traffic.
68 +
69 + ### Phase 4: Resilience
70 + - [ ] **Cloudflare Load Balancing health checks** in front of Hetzner. Today origin-down = MNW-down. CF can serve a maintenance page from cache + status banner.
71 + - [ ] **CF Tunnel as backup origin path.** If Hetzner public IP is under attack, CF Tunnel reaches origin via Cloudflare network. Cheap insurance.
72 + - [ ] **Stale-while-revalidate on public content pages** so origin restarts (deploys) don't visibly blip public traffic.
73 +
74 + ### What we explicitly are NOT doing
75 + - **Cloudflare Workers as the primary application platform.** Rust app stays on Hetzner. Workers are for thin edge logic only.
76 + - **Cloudflare D1 / KV as primary data store.** Postgres on Hetzner is the source of truth.
77 + - **CF Email Workers / Email Routing for transactional mail.** Postmark works, no reason to migrate.
78 +
79 + ### Metrics to instrument (so triggers above are real, not vibes)
80 + - [ ] Weekly Cloudflare cache hit ratio (already in scaling.md backlog — surface in PoM)
81 + - [ ] Hetzner egress GB/day vs projection
82 + - [ ] caddy-ask QPS + cache-hit ratio on `domain_cache`
83 + - [ ] Origin CPU + connection-pool utilization (PoM `pg_stat_activity` probe — also in scaling.md)
84 +
21 85 ---
22 86
23 87 ## Stripe SDK Migration (Post-Launch, was blocking)