| 1 |
# Config artifacts vs binary artifacts |
| 2 |
|
| 3 |
Phase 3 design doc. Resolves: which of `deploy.sh`'s per-deploy actions sando absorbs, which move to one-time node-bootstrap, which sando explicitly skips. |
| 4 |
|
| 5 |
Status: draft. Decisions below are recommendations; checkboxes match `MNW/sando/todo.md` Phase 3. |
| 6 |
|
| 7 |
## Inventory of `deploy.sh`'s actions |
| 8 |
|
| 9 |
|
| 10 |
|
| 11 |
| `build_binary` | per-deploy | cargo-zigbuild on macOS → x86_64 Linux musl/glibc | |
| 12 |
| `upload_config: Caddyfile` | per-deploy | scp `Caddyfile` → `/etc/caddy/Caddyfile`, `systemctl reload caddy` | |
| 13 |
| `upload_config: error-pages` | per-deploy | scp `error-pages/*.html` → `/opt/makenotwork/error-pages/` | |
| 14 |
| `upload_config: security` | per-deploy | scp `sshd-git.conf`, `fail2ban-sshd.conf`, `setup-firewall.sh` | |
| 15 |
| `upload_config: chmod` | per-deploy | chmod +x on setup-* scripts | |
| 16 |
| `upload_binary` | per-deploy | scp `makenotwork` + `mnw-admin` → `/opt/makenotwork/` | |
| 17 |
| `send_restart_warning` | per-deploy | POST `/api/internal/restart-warning` (30s notice), sleep 30s | |
| 18 |
| `restart_app` | per-deploy | `systemctl restart makenotwork`; curl 127.0.0.1:3000 to verify | |
| 19 |
| `sqlx migrate run` (implied) | startup | server runs migrations on startup in `main.rs:73` | |
| 20 |
|
| 21 |
## Decision per item |
| 22 |
|
| 23 |
### 1. Caddyfile — **bootstrap-only, not per-deploy** |
| 24 |
|
| 25 |
Caddy config is stable infrastructure. Most releases don't touch it. Per-deploy uploads couple binary version to config version unnecessarily and risk reload churn for unchanged config. |
| 26 |
|
| 27 |
- Node-bootstrap script installs `/etc/caddy/Caddyfile` once. |
| 28 |
- Updating Caddy config is an explicit operator action (`sando-cli push-caddy` or just `scp + systemctl reload caddy` manually), tracked but not per-release. |
| 29 |
- Revisit if Caddy config changes start landing >1x per sprint, then move to per-release artifact under `releases/<version>/Caddyfile` with a deploy hook. |
| 30 |
|
| 31 |
**Per-project alternative tracked:** if a Caddyfile change accompanies a binary change (rare), the operator must run the explicit Caddy-push step alongside `sando promote`. |
| 32 |
|
| 33 |
### 2. error-pages — **bake into binary** |
| 34 |
|
| 35 |
Error pages version with code. They reference brand glyphs (diamond mark) and copy that drifts with the rest of the site. |
| 36 |
|
| 37 |
- Use `include_dir!` or `include_bytes!` to embed `server/deploy/error-pages/*.html` into the binary. |
| 38 |
- Update Caddy `handle_errors` blocks to point at an in-app fallback route (e.g. `/__errors/404.html`) instead of `/opt/makenotwork/error-pages/`. That route can serve the embedded HTML. |
| 39 |
|
| 40 |
Cost: small MNW server PR (separate from sando). Marks `deploy.sh upload_config: error-pages` step removable. |
| 41 |
|
| 42 |
Until that lands: ship error-pages as sibling under `releases/<version>/error-pages/`. Caddy still reads from `/opt/makenotwork/error-pages/` symlinked to `current/error-pages`. (Track A on testnot already has the `current` symlink working; just symlink error-pages parallel.) |
| 43 |
|
| 44 |
### 3. mnw-admin binary — **ship alongside server** |
| 45 |
|
| 46 |
`mnw-admin` is part of the release; deploy.sh uploads it. Sando should too. |
| 47 |
|
| 48 |
- Extend `cfg.bin_name: String` → `cfg.bin_names: Vec<String>` (e.g. `["makenotwork", "mnw-admin"]`). |
| 49 |
- `deploy_local` + `deploy_node` iterate over the list, rsyncing each to `releases/<version>/<bin>`. |
| 50 |
- Build step looks up each in `server/target/release/<bin>`. |
| 51 |
|
| 52 |
Default stays `["server"]` for backwards-compat with the existing example config. |
| 53 |
|
| 54 |
### 4. systemd unit (`makenotwork.service`) — **bootstrap-only** |
| 55 |
|
| 56 |
The unit references `<release_root>/current/makenotwork`. Once installed, it doesn't change per release. |
| 57 |
|
| 58 |
- Node-bootstrap script installs `/etc/systemd/system/makenotwork.service`. |
| 59 |
- `deploy.sh`'s upload of the unit was a re-upload-every-time pattern. Sando does not. |
| 60 |
- If the unit ever needs to change (e.g. resource limits, env file path), that's a one-shot operator action, not a per-deploy step. |
| 61 |
|
| 62 |
### 5. Security configs (sshd-git, fail2ban, firewall) — **bootstrap-only** |
| 63 |
|
| 64 |
These are one-time host hardening. They have no release coupling. |
| 65 |
|
| 66 |
- Node-bootstrap script installs them on first provision. |
| 67 |
- Updates are out-of-band operator actions (or fold into a `sando push-config` later). |
| 68 |
|
| 69 |
### 6. backup-db.sh — **bootstrap-only** |
| 70 |
|
| 71 |
Same as security configs. Backup script is host infrastructure, not release artifact. |
| 72 |
|
| 73 |
- Node-bootstrap installs `backup-db.sh` and its cron entry. |
| 74 |
- Updates out-of-band. |
| 75 |
- Bonus: backup-db.sh should be updated to (a) maintain `latest.sql.gz` hard link, (b) push to astra for true offsite — currently broken (see separate "offsite sync broken" ticket). |
| 76 |
|
| 77 |
### 7. Restart warning — **defer to Phase 5; track for prod cutover** |
| 78 |
|
| 79 |
`deploy.sh` posts a 30s warning, sleeps 30s, then restarts. Sando does NOT yet do this. |
| 80 |
|
| 81 |
- For testnot (low traffic): skip. Service crash-loops invisibly enough. |
| 82 |
- For prod cutover: sando must implement this. Options: |
| 83 |
- **A**: Sando POSTs `/api/internal/restart-warning` itself, requires CLI_SERVICE_TOKEN exposed to sando. Token would live in `/etc/sando/sando.env` on fw13. |
| 84 |
- **B**: Sando exposes a `pre_deploy_hook` per-tier in `sando.toml` (shell command); operator decides. |
| 85 |
- Recommendation: **A** for prod tiers only (`tier.restart_warning_seconds = 30` in `sando.toml`). Tier A (testnot) leaves it unset = no warning. |
| 86 |
|
| 87 |
Phase 5 implementation, not blocking cutover-readiness. |
| 88 |
|
| 89 |
### 8. Cross-compile from macOS — **retire** |
| 90 |
|
| 91 |
fw13 is x86_64 Ubuntu-derived, prod is x86_64 Ubuntu 24.04. Sando builds natively. Cargo-zigbuild path goes away once sando is canonical. |
| 92 |
|
| 93 |
- Verify: take a recent prod binary (from `deploy.sh`'s build) and sando's binary for the same sha, compare runtime behavior across one full sprint of testnot use. |
| 94 |
- Once verified, mark `deploy.sh` archived and delete cargo-zigbuild from dev-machine setup notes. |
| 95 |
|
| 96 |
### 9. Prod migrations — **server-self-applies on startup; sando does NOT** |
| 97 |
|
| 98 |
MNW server runs `sqlx::migrate!("./migrations").run(&db).await` in `main.rs:73` at startup. This means: |
| 99 |
|
| 100 |
- A new binary starting up applies any pending migrations against the live prod DB. |
| 101 |
- Sando does not need an explicit `POST /migrate/{tier}` endpoint. |
| 102 |
- The `migration_dry_run` gate's purpose is to catch migration FAILURE before the live binary tries to run them — that's the prod safety net. |
| 103 |
- Risk: a partially-applied migration (e.g. multi-statement, the 2026-05-22 incident class) can leave the DB in a broken state mid-startup. Sandbox the migration via `migration_dry_run` catches this; the live server then either succeeds or fails-and-crash-loops on the same migration sequence. |
| 104 |
- Open question: should sando refuse to promote if `migration_dry_run` flags the upcoming version as a destructive migration (drop+recreate column)? Phase 5+ enhancement. |
| 105 |
|
| 106 |
**Action:** none — current architecture is correct. Document this in `plans/migration-dryrun-failures.md` (Phase 2 follow-up). |
| 107 |
|
| 108 |
## Net effect on `deploy.sh` |
| 109 |
|
| 110 |
|
| 111 |
|
| 112 |
| build_binary | yes (native on fw13) | | | |
| 113 |
| upload_config | | yes (Caddyfile, etc.) | | |
| 114 |
| upload_binary | yes (+ mnw-admin) | | | |
| 115 |
| send_restart_warning| yes (Phase 5, prod tier only)| | | |
| 116 |
| restart_app | yes (reload-or-restart) | | | |
| 117 |
|
| 118 |
Once items 2-9 above land, `deploy.sh` becomes redundant and moves to `server/deploy/archive/`. |
| 119 |
|
| 120 |
## Implementation order |
| 121 |
|
| 122 |
1. **`bin_names: Vec<String>`** — small, unblocks mnw-admin shipping (#3). |
| 123 |
2. **error-pages as release sibling + symlink** — small, unblocks #2 until bake-into-binary lands. |
| 124 |
3. **node-bootstrap script** — folds Caddyfile (#1), unit (#4), security (#5), backup (#6) into one idempotent script. Already a Phase 1 carryover. |
| 125 |
4. **Phase 5: restart_warning hook** — when prod cutover gets scheduled. |
| 126 |
5. **Prod cutover sprint** — verify binary parity (#8), retire `deploy.sh` (#9 needs no action). |
| 127 |
|