Skip to main content

max / makenotwork

8.3 KB · 127 lines History Blame Raw
1 # Config artifacts vs binary artifacts
2
3 Phase 3 design doc. Resolves: which of `deploy.sh`'s per-deploy actions sando absorbs, which move to one-time node-bootstrap, which sando explicitly skips.
4
5 Status: draft. Decisions below are recommendations; checkboxes match `MNW/sando/todo.md` Phase 3.
6
7 ## Inventory of `deploy.sh`'s actions
8
9 | Action | Frequency | What it does |
10 |------------------------------|----------------|----------------------------------------------------------------|
11 | `build_binary` | per-deploy | cargo-zigbuild on macOS → x86_64 Linux musl/glibc |
12 | `upload_config: Caddyfile` | per-deploy | scp `Caddyfile``/etc/caddy/Caddyfile`, `systemctl reload caddy` |
13 | `upload_config: error-pages` | per-deploy | scp `error-pages/*.html``/opt/makenotwork/error-pages/` |
14 | `upload_config: security` | per-deploy | scp `sshd-git.conf`, `fail2ban-sshd.conf`, `setup-firewall.sh` |
15 | `upload_config: chmod` | per-deploy | chmod +x on setup-* scripts |
16 | `upload_binary` | per-deploy | scp `makenotwork` + `mnw-admin``/opt/makenotwork/` |
17 | `send_restart_warning` | per-deploy | POST `/api/internal/restart-warning` (30s notice), sleep 30s |
18 | `restart_app` | per-deploy | `systemctl restart makenotwork`; curl 127.0.0.1:3000 to verify |
19 | `sqlx migrate run` (implied) | startup | server runs migrations on startup in `main.rs:73` |
20
21 ## Decision per item
22
23 ### 1. Caddyfile — **bootstrap-only, not per-deploy**
24
25 Caddy config is stable infrastructure. Most releases don't touch it. Per-deploy uploads couple binary version to config version unnecessarily and risk reload churn for unchanged config.
26
27 - Node-bootstrap script installs `/etc/caddy/Caddyfile` once.
28 - Updating Caddy config is an explicit operator action (`sando-cli push-caddy` or just `scp + systemctl reload caddy` manually), tracked but not per-release.
29 - Revisit if Caddy config changes start landing >1x per sprint, then move to per-release artifact under `releases/<version>/Caddyfile` with a deploy hook.
30
31 **Per-project alternative tracked:** if a Caddyfile change accompanies a binary change (rare), the operator must run the explicit Caddy-push step alongside `sando promote`.
32
33 ### 2. error-pages — **bake into binary**
34
35 Error pages version with code. They reference brand glyphs (diamond mark) and copy that drifts with the rest of the site.
36
37 - Use `include_dir!` or `include_bytes!` to embed `server/deploy/error-pages/*.html` into the binary.
38 - Update Caddy `handle_errors` blocks to point at an in-app fallback route (e.g. `/__errors/404.html`) instead of `/opt/makenotwork/error-pages/`. That route can serve the embedded HTML.
39
40 Cost: small MNW server PR (separate from sando). Marks `deploy.sh upload_config: error-pages` step removable.
41
42 Until that lands: ship error-pages as sibling under `releases/<version>/error-pages/`. Caddy still reads from `/opt/makenotwork/error-pages/` symlinked to `current/error-pages`. (Track A on testnot already has the `current` symlink working; just symlink error-pages parallel.)
43
44 ### 3. mnw-admin binary — **ship alongside server**
45
46 `mnw-admin` is part of the release; deploy.sh uploads it. Sando should too.
47
48 - Extend `cfg.bin_name: String``cfg.bin_names: Vec<String>` (e.g. `["makenotwork", "mnw-admin"]`).
49 - `deploy_local` + `deploy_node` iterate over the list, rsyncing each to `releases/<version>/<bin>`.
50 - Build step looks up each in `server/target/release/<bin>`.
51
52 Default stays `["server"]` for backwards-compat with the existing example config.
53
54 ### 4. systemd unit (`makenotwork.service`) — **bootstrap-only**
55
56 The unit references `<release_root>/current/makenotwork`. Once installed, it doesn't change per release.
57
58 - Node-bootstrap script installs `/etc/systemd/system/makenotwork.service`.
59 - `deploy.sh`'s upload of the unit was a re-upload-every-time pattern. Sando does not.
60 - If the unit ever needs to change (e.g. resource limits, env file path), that's a one-shot operator action, not a per-deploy step.
61
62 ### 5. Security configs (sshd-git, fail2ban, firewall) — **bootstrap-only**
63
64 These are one-time host hardening. They have no release coupling.
65
66 - Node-bootstrap script installs them on first provision.
67 - Updates are out-of-band operator actions (or fold into a `sando push-config` later).
68
69 ### 6. backup-db.sh — **bootstrap-only**
70
71 Same as security configs. Backup script is host infrastructure, not release artifact.
72
73 - Node-bootstrap installs `backup-db.sh` and its cron entry.
74 - Updates out-of-band.
75 - Bonus: backup-db.sh should be updated to (a) maintain `latest.sql.gz` hard link, (b) push to astra for true offsite — currently broken (see separate "offsite sync broken" ticket).
76
77 ### 7. Restart warning — **defer to Phase 5; track for prod cutover**
78
79 `deploy.sh` posts a 30s warning, sleeps 30s, then restarts. Sando does NOT yet do this.
80
81 - For testnot (low traffic): skip. Service crash-loops invisibly enough.
82 - For prod cutover: sando must implement this. Options:
83 - **A**: Sando POSTs `/api/internal/restart-warning` itself, requires CLI_SERVICE_TOKEN exposed to sando. Token would live in `/etc/sando/sando.env` on fw13.
84 - **B**: Sando exposes a `pre_deploy_hook` per-tier in `sando.toml` (shell command); operator decides.
85 - Recommendation: **A** for prod tiers only (`tier.restart_warning_seconds = 30` in `sando.toml`). Tier A (testnot) leaves it unset = no warning.
86
87 Phase 5 implementation, not blocking cutover-readiness.
88
89 ### 8. Cross-compile from macOS — **retire**
90
91 fw13 is x86_64 Ubuntu-derived, prod is x86_64 Ubuntu 24.04. Sando builds natively. Cargo-zigbuild path goes away once sando is canonical.
92
93 - Verify: take a recent prod binary (from `deploy.sh`'s build) and sando's binary for the same sha, compare runtime behavior across one full sprint of testnot use.
94 - Once verified, mark `deploy.sh` archived and delete cargo-zigbuild from dev-machine setup notes.
95
96 ### 9. Prod migrations — **server-self-applies on startup; sando does NOT**
97
98 MNW server runs `sqlx::migrate!("./migrations").run(&db).await` in `main.rs:73` at startup. This means:
99
100 - A new binary starting up applies any pending migrations against the live prod DB.
101 - Sando does not need an explicit `POST /migrate/{tier}` endpoint.
102 - The `migration_dry_run` gate's purpose is to catch migration FAILURE before the live binary tries to run them — that's the prod safety net.
103 - Risk: a partially-applied migration (e.g. multi-statement, the 2026-05-22 incident class) can leave the DB in a broken state mid-startup. Sandbox the migration via `migration_dry_run` catches this; the live server then either succeeds or fails-and-crash-loops on the same migration sequence.
104 - Open question: should sando refuse to promote if `migration_dry_run` flags the upcoming version as a destructive migration (drop+recreate column)? Phase 5+ enhancement.
105
106 **Action:** none — current architecture is correct. Document this in `plans/migration-dryrun-failures.md` (Phase 2 follow-up).
107
108 ## Net effect on `deploy.sh`
109
110 | Step | Replaced by Sando | Moved to node-bootstrap | Retired |
111 |---------------------|------------------------------|-------------------------|---------|
112 | build_binary | yes (native on fw13) | | |
113 | upload_config | | yes (Caddyfile, etc.) | |
114 | upload_binary | yes (+ mnw-admin) | | |
115 | send_restart_warning| yes (Phase 5, prod tier only)| | |
116 | restart_app | yes (reload-or-restart) | | |
117
118 Once items 2-9 above land, `deploy.sh` becomes redundant and moves to `server/deploy/archive/`.
119
120 ## Implementation order
121
122 1. **`bin_names: Vec<String>`** — small, unblocks mnw-admin shipping (#3).
123 2. **error-pages as release sibling + symlink** — small, unblocks #2 until bake-into-binary lands.
124 3. **node-bootstrap script** — folds Caddyfile (#1), unit (#4), security (#5), backup (#6) into one idempotent script. Already a Phase 1 carryover.
125 4. **Phase 5: restart_warning hook** — when prod cutover gets scheduled.
126 5. **Prod cutover sprint** — verify binary parity (#8), retire `deploy.sh` (#9 needs no action).
127