# Config artifacts vs binary artifacts Phase 3 design doc. Resolves: which of `deploy.sh`'s per-deploy actions sando absorbs, which move to one-time node-bootstrap, which sando explicitly skips. Status: draft. Decisions below are recommendations; checkboxes match `MNW/sando/todo.md` Phase 3. ## Inventory of `deploy.sh`'s actions | Action | Frequency | What it does | |------------------------------|----------------|----------------------------------------------------------------| | `build_binary` | per-deploy | cargo-zigbuild on macOS → x86_64 Linux musl/glibc | | `upload_config: Caddyfile` | per-deploy | scp `Caddyfile` → `/etc/caddy/Caddyfile`, `systemctl reload caddy` | | `upload_config: error-pages` | per-deploy | scp `error-pages/*.html` → `/opt/makenotwork/error-pages/` | | `upload_config: security` | per-deploy | scp `sshd-git.conf`, `fail2ban-sshd.conf`, `setup-firewall.sh` | | `upload_config: chmod` | per-deploy | chmod +x on setup-* scripts | | `upload_binary` | per-deploy | scp `makenotwork` + `mnw-admin` → `/opt/makenotwork/` | | `send_restart_warning` | per-deploy | POST `/api/internal/restart-warning` (30s notice), sleep 30s | | `restart_app` | per-deploy | `systemctl restart makenotwork`; curl 127.0.0.1:3000 to verify | | `sqlx migrate run` (implied) | startup | server runs migrations on startup in `main.rs:73` | ## Decision per item ### 1. Caddyfile — **bootstrap-only, not per-deploy** Caddy config is stable infrastructure. Most releases don't touch it. Per-deploy uploads couple binary version to config version unnecessarily and risk reload churn for unchanged config. - Node-bootstrap script installs `/etc/caddy/Caddyfile` once. - Updating Caddy config is an explicit operator action (`sando-cli push-caddy` or just `scp + systemctl reload caddy` manually), tracked but not per-release. - Revisit if Caddy config changes start landing >1x per sprint, then move to per-release artifact under `releases//Caddyfile` with a deploy hook. **Per-project alternative tracked:** if a Caddyfile change accompanies a binary change (rare), the operator must run the explicit Caddy-push step alongside `sando promote`. ### 2. error-pages — **bake into binary** Error pages version with code. They reference brand glyphs (diamond mark) and copy that drifts with the rest of the site. - Use `include_dir!` or `include_bytes!` to embed `server/deploy/error-pages/*.html` into the binary. - Update Caddy `handle_errors` blocks to point at an in-app fallback route (e.g. `/__errors/404.html`) instead of `/opt/makenotwork/error-pages/`. That route can serve the embedded HTML. Cost: small MNW server PR (separate from sando). Marks `deploy.sh upload_config: error-pages` step removable. Until that lands: ship error-pages as sibling under `releases//error-pages/`. Caddy still reads from `/opt/makenotwork/error-pages/` symlinked to `current/error-pages`. (Track A on testnot already has the `current` symlink working; just symlink error-pages parallel.) ### 3. mnw-admin binary — **ship alongside server** `mnw-admin` is part of the release; deploy.sh uploads it. Sando should too. - Extend `cfg.bin_name: String` → `cfg.bin_names: Vec` (e.g. `["makenotwork", "mnw-admin"]`). - `deploy_local` + `deploy_node` iterate over the list, rsyncing each to `releases//`. - Build step looks up each in `server/target/release/`. Default stays `["server"]` for backwards-compat with the existing example config. ### 4. systemd unit (`makenotwork.service`) — **bootstrap-only** The unit references `/current/makenotwork`. Once installed, it doesn't change per release. - Node-bootstrap script installs `/etc/systemd/system/makenotwork.service`. - `deploy.sh`'s upload of the unit was a re-upload-every-time pattern. Sando does not. - If the unit ever needs to change (e.g. resource limits, env file path), that's a one-shot operator action, not a per-deploy step. ### 5. Security configs (sshd-git, fail2ban, firewall) — **bootstrap-only** These are one-time host hardening. They have no release coupling. - Node-bootstrap script installs them on first provision. - Updates are out-of-band operator actions (or fold into a `sando push-config` later). ### 6. backup-db.sh — **bootstrap-only** Same as security configs. Backup script is host infrastructure, not release artifact. - Node-bootstrap installs `backup-db.sh` and its cron entry. - Updates out-of-band. - Bonus: backup-db.sh should be updated to (a) maintain `latest.sql.gz` hard link, (b) push to astra for true offsite — currently broken (see separate "offsite sync broken" ticket). ### 7. Restart warning — **defer to Phase 5; track for prod cutover** `deploy.sh` posts a 30s warning, sleeps 30s, then restarts. Sando does NOT yet do this. - For testnot (low traffic): skip. Service crash-loops invisibly enough. - For prod cutover: sando must implement this. Options: - **A**: Sando POSTs `/api/internal/restart-warning` itself, requires CLI_SERVICE_TOKEN exposed to sando. Token would live in `/etc/sando/sando.env` on fw13. - **B**: Sando exposes a `pre_deploy_hook` per-tier in `sando.toml` (shell command); operator decides. - Recommendation: **A** for prod tiers only (`tier.restart_warning_seconds = 30` in `sando.toml`). Tier A (testnot) leaves it unset = no warning. Phase 5 implementation, not blocking cutover-readiness. ### 8. Cross-compile from macOS — **retire** fw13 is x86_64 Ubuntu-derived, prod is x86_64 Ubuntu 24.04. Sando builds natively. Cargo-zigbuild path goes away once sando is canonical. - Verify: take a recent prod binary (from `deploy.sh`'s build) and sando's binary for the same sha, compare runtime behavior across one full sprint of testnot use. - Once verified, mark `deploy.sh` archived and delete cargo-zigbuild from dev-machine setup notes. ### 9. Prod migrations — **server-self-applies on startup; sando does NOT** MNW server runs `sqlx::migrate!("./migrations").run(&db).await` in `main.rs:73` at startup. This means: - A new binary starting up applies any pending migrations against the live prod DB. - Sando does not need an explicit `POST /migrate/{tier}` endpoint. - The `migration_dry_run` gate's purpose is to catch migration FAILURE before the live binary tries to run them — that's the prod safety net. - Risk: a partially-applied migration (e.g. multi-statement, the 2026-05-22 incident class) can leave the DB in a broken state mid-startup. Sandbox the migration via `migration_dry_run` catches this; the live server then either succeeds or fails-and-crash-loops on the same migration sequence. - Open question: should sando refuse to promote if `migration_dry_run` flags the upcoming version as a destructive migration (drop+recreate column)? Phase 5+ enhancement. **Action:** none — current architecture is correct. Document this in `plans/migration-dryrun-failures.md` (Phase 2 follow-up). ## Net effect on `deploy.sh` | Step | Replaced by Sando | Moved to node-bootstrap | Retired | |---------------------|------------------------------|-------------------------|---------| | build_binary | yes (native on fw13) | | | | upload_config | | yes (Caddyfile, etc.) | | | upload_binary | yes (+ mnw-admin) | | | | send_restart_warning| yes (Phase 5, prod tier only)| | | | restart_app | yes (reload-or-restart) | | | Once items 2-9 above land, `deploy.sh` becomes redundant and moves to `server/deploy/archive/`. ## Implementation order 1. **`bin_names: Vec`** — small, unblocks mnw-admin shipping (#3). 2. **error-pages as release sibling + symlink** — small, unblocks #2 until bake-into-binary lands. 3. **node-bootstrap script** — folds Caddyfile (#1), unit (#4), security (#5), backup (#6) into one idempotent script. Already a Phase 1 carryover. 4. **Phase 5: restart_warning hook** — when prod cutover gets scheduled. 5. **Prod cutover sprint** — verify binary parity (#8), retire `deploy.sh` (#9 needs no action).