# Session 3 — first sando-driven prod deploy Captured 2026-06-03 after the cutover. Resolves §6.5 step 8 of `launchplan_final.md`: first full sando deploy to Hetzner prod, replacing `deploy.sh` as the live deploy path. Status: **complete 2026-06-03.** Prod runs `makenotwork` 0.9.5 (sha `f0970b8`) from `/opt/mnw/current/`, deployed via `POST /promote/b {"hotfix":true}` from sandod on fw13. Outage window 3m25s (02:50:33 → 02:53:58 UTC). All features green. See §F for outcomes and §G for the four hardcoded paths that block the eventual `rm -rf /opt/makenotwork/`. ## Background — Session 1 set the layout, Session 2 proved it on testnot, Session 3 cut prod over Session 1 redesigned the on-disk layout (`/opt/mnw/releases//` + `current` symlink; `/etc/mnw/makenotwork.env`; `/var/lib/mnw/` for state) and shipped the sando-side code that produces the full versioned bundle (binaries + static + docs + error-pages + assumptions). Session 2 reprovisioned testnot under that layout; the first remote deploy of the full bundle landed cleanly after three small gotchas (sqlx URL form, pg_ident map, `ASSUMPTIONS_PATH` mismatch — all logged in `launchplan_final.md` §6.9). Session 3 is the real-stakes one: prod was on 0.9.1 via `deploy.sh`, `/opt/makenotwork/` had eight months of accreted state (885M of backups, .env, yara-rules, ssh dir, rustdoc, sudoers entries, cron jobs, Caddyfile references). The Session 1 plan enumerated some of the move sequence but understated the surface area; the actual cutover surfaced several things worth documenting so the next major reprovision (or a disaster-recovery rebuild) doesn't re-discover them. ## A. Inventory taken before any prod write `/opt/makenotwork/` contents (`makenotwork:makenotwork` unless noted): - `makenotwork`, `mnw-admin` — 0.9.1 binaries (`root:root`) - `.env` (110 lines), 5× `.env.bak.*` files (`root:root`) - `docs/`, `static/`, `error-pages/` — content (will be replaced by release bundle) - `backups/` — 885M - `yara-rules/` — 8.5M compiled, `root:root` - `yara-rules-src/` — upstream YARA sources (compiled to `yara-rules/`), `root:root` - `rustdoc/` — generated docs, `501:staff` (uploaded from Mac via `deploy.sh`) - `ssh/` — `known_hosts` for build runner, `root:root` - `backup-db.sh` — cron'd daily at 03:00 UTC from `makenotwork`'s crontab - `deploy/` — `deploy.sh` staging area, `root:root` Other prod state in play: - `/opt/git/` — 99M, `git:git`. Both git user's home (`/etc/passwd` says `git:x:995:986::/opt/git:/bin/sh`) *and* the GIT_REPOS_PATH target. Conflating these turns out to matter (§F). - `/etc/caddy/Caddyfile` — three `root * /opt/makenotwork/error-pages` lines. - `/etc/sudoers.d/mnw-git-ssh` — `makenotwork ALL=(git) NOPASSWD: /opt/makenotwork/mnw-admin rebuild-keys`. - `/etc/sudoers.d/mnw-cli-git` — `mnw-cli ALL=(git) NOPASSWD: /usr/bin/git-*, /usr/bin/tee, /usr/bin/chmod`. No /opt path references; left alone. - `makenotwork` user crontab: `0 3 * * * /opt/makenotwork/backup-db.sh >> /opt/makenotwork/backups/backup.log 2>&1`. - Root crontab: `0 3 * * * /opt/backups/pg_backup.sh >> /var/log/pg_backup.log 2>&1` — unrelated, left alone. ## B. Pre-flight (no prod impact) 1. **`sando.toml` tier B fixed.** Was `deploy@prod-1.makenot.work` (NXDOMAIN, no port). Now `makenotwork@alpha-west-1` with port handling via `~sando/.ssh/config` Host block. Chose to keep service user as `makenotwork` rather than introduce a `deploy` user — avoids chowning 885M of backups and redoing pg peer auth that's been stable for months. The same reasoning applies to a hypothetical tier C: keep the existing user, don't introduce a new one for cosmetic uniformity with testnot. 2. **Sando pubkey installed** in `/home/makenotwork/.ssh/authorized_keys` (mode 0600, owned makenotwork). 3. **`chsh -s /bin/bash makenotwork`** — was `/usr/sbin/nologin`. SSH was rejecting connections, not key auth failing. Worth detecting/fixing in `bootstrap-node.sh` for future provisions where someone has hardened the runtime user. 4. **`/srv/sando/.ssh/config`** Host block for port 2200; `known_hosts` seeded via `ssh-keyscan -p 2200`. 5. **Dry-run rsync** from sando → prod's `/opt/mnw/releases/_probe/` succeeded (after `bootstrap-node.sh` created `/opt/mnw/`). ## C. Cutover sequence (3m25s outage) In order, with the exact reason each step exists: 1. **`systemctl stop makenotwork`** — 02:50:33 UTC. Outage window starts. 2. **Backups taken**: `/etc/systemd/system/makenotwork.service → /root/makenotwork.service.bak-pre-cutover`; `/opt/makenotwork/.env → /root/dotenv.bak-pre-cutover`; Caddyfile, sudoers, crontab also backed up to `/root/*.bak-pre-cutover`. Rollback path for any step failing before service restart. 3. **`bootstrap-node.sh`** with `SERVICE_USER=makenotwork SANDO_PUBKEY=… INSTALL_POSTGRES=0 INSTALL_CADDY=0 INSTALL_TAILSCALE=0 ENABLE_FIREWALL=0` — postgres/caddy/tailscale/UFW already configured on prod, don't touch. Created `/opt/mnw/`, `/etc/mnw/`, `/var/lib/mnw/`, the new systemd unit, the unused `deploy` user (harmless), the sudoers entry for `deploy`. The new unit references `EnvironmentFile=/etc/mnw/makenotwork.env` and `ReadWritePaths=/var/lib/mnw`, with `RestartPreventExitStatus=2` (MNW server convention: exit 2 = migration failure, don't crashloop). 4. **`cp /opt/makenotwork/.env /etc/mnw/makenotwork.env`** (copy, not move — original stays for one-week rollback). `chmod 0640 root:makenotwork`. Then `sed` rewrites of `DOCS_PATH`, `ASSUMPTIONS_PATH`, `YARA_RULES_DIR`, `GIT_REPOS_PATH` for the new layout. `HOST`, `PORT`, `DATABASE_URL`, `HOST_URL` unchanged. 5. **`ln -s /opt/makenotwork/yara-rules /opt/mnw/yara-rules`** — yara-rules is operator-managed (independent update cadence), not in the release bundle (Session 1 layout principle: category #3). The symlink lets the new env's `YARA_RULES_DIR=/opt/mnw/yara-rules` continue to resolve. When `/opt/makenotwork/` is eventually removed, the rules dir moves to a permanent path (probably `/var/lib/mnw/yara-rules` or `/etc/mnw/yara-rules`) and the symlink retargets. 6. **`rsync -aHX /opt/git/ /var/lib/mnw/git/`** — preserves `git:git` ownership and the directory hardlinks. `chmod 0755 /var/lib/mnw` so the git user can traverse (default was 0750 makenotwork:makenotwork, which blocked git's git-receive-pack from reaching the repos). 7. **Caddyfile rewrite**: `sed -i 's|/opt/makenotwork/error-pages|/opt/mnw/current/error-pages|g'`. `caddy validate` before reload; `systemctl reload caddy`. 8. **Sudoers rewrite**: same sed pattern on `/etc/sudoers.d/mnw-git-ssh`; `visudo -c -f` to validate. 9. **`systemctl daemon-reload`** to pick up the new unit. 10. **`systemctl restart sandod`** on fw13 — sandod caches `sando.toml` at startup; the new tier B target wouldn't have taken effect without this. **First `POST /promote/b` failed with NXDOMAIN against the stale `prod-1.makenot.work` because sandod hadn't been restarted yet.** Fixed by restarting sandod and re-promoting. 11. **`POST /promote/b {"hotfix":true}`** — `hotfix: true` bypasses the 48h burn-in on tier A (which had just promoted to 0.9.5 ~15 min prior; burn-in not yet elapsed). Sando rsync'd the 161MB bundle to `/opt/mnw/releases/0.9.5/`, swapped the `current` symlink, called `systemctl reload-or-restart makenotwork.service`. 12. **Service up 02:53:55 UTC.** Outage window ends 02:53:58 once health serves 200. 733 YARA rules compiled, all integrations (S3, Stripe, MT, WAM, git, scanner, custom domain cache) live. 13. **External smoke checks**: `/`, `/login`, `/pricing`, `/docs`, `/docs/economics`, `/docs/roadmap`, `/docs/tiers` — all 200. 14. **`rebuild-keys` to regenerate `/opt/git/.ssh/authorized_keys`** — `dotenvy` doesn't auto-load when running mnw-admin standalone (it loads from `/opt/makenotwork/.env`, mode 0600 `makenotwork:makenotwork`, unreadable by git). Worked around by sourcing the env in root then `sudo -u git -E`. **Regenerated keys still contain `command="/opt/makenotwork/mnw-admin git-auth ..."`** — see §G. 15. **Git push test** — `git ls-remote git@ssh.makenot.work:max/meta.git` returns refs cleanly. Cutover verified end-to-end. ## D. What stayed in place (intentional) - `/opt/makenotwork/` — full contents, untouched. Soak rollback path: stop new unit, swap systemd unit back, start old binary. Plan: `rm -rf` after a week, post-0.9.6 deploy (see §G). - `/opt/git/` — untouched. Git user's `/etc/passwd` home; mnw-admin's regenerated `authorized_keys` writes to `/opt/git/.ssh/authorized_keys` (not `/home/git/`, despite earlier confusion). The rsync to `/var/lib/mnw/git/` populated the new GIT_REPOS_PATH; the server reads from there, but git push lands in `/opt/git/` because that's git user's home. Both paths now hold the repo bytes; that's wasteful but harmless during the soak. - `/opt/makenotwork/backups/` — 885M of pg dumps. Script and cron still write there. Sando's backup-fetch on fw13 still pulls from there (configured pre-cutover). Migration to `/var/lib/mnw/backups/` is its own follow-up (touches script, crontab, fw13 sando config). - `yara-rules-src/`, `rustdoc/`, `ssh/`, `.env.bak.*` — not in any env var or systemd path. Confirmed by grepping the running 0.9.5 binary's path references. Will be swept in the post-soak cleanup. ## E. What broke and how it was caught Three small things, all caught by smoke checks: 1. **`sandod` cached `sando.toml`.** First promote attempt returned `creating remote release dir` (an in-flight progress string that became the error message). `journalctl -u sandod` showed it was still resolving `prod-1.makenot.work`. `scp sando.toml fw13:/tmp/`, `sudo cp /tmp/sando.toml /etc/sando/sando.toml`, `sudo systemctl restart sandod`, re-promote. Worth documenting that `sandod` does not watch the file; alternative is to add an inotify or SIGHUP handler. 2. **First doc smoke checks were wrong URLs.** `/about/economics`, `/docs/about/economics` returned 404; panicked briefly that the cutover broke doc routing. False alarm: the route is `/docs/{slug}` where slug is the filename stem (e.g., `/docs/economics`). Verified with `grep doc_page MNW/server/src/` after the panic. **Worth fixing in any future smoke script** — use the real URL scheme, not guessed-from-filesystem paths. 3. **`mnw-admin rebuild-keys` needed env loading from root context.** `sudo -u git /opt/mnw/current/mnw-admin rebuild-keys` fails with `DATABASE_URL must be set: NotPresent` because the binary's `dotenvy::from_path("/opt/makenotwork/.env")` runs as git, which can't read `.env` (mode 0600 makenotwork). Workaround: `set -a; source /etc/mnw/makenotwork.env; set +a; sudo -u git -E /opt/mnw/current/mnw-admin rebuild-keys`. Cleanest long-term fix is in §G. ## F. Outcomes (verified) **Sando state after cutover:** ``` host cur=0.9.5 prev=0.9.5 burn_in_started=2026-06-03T02:23:28Z a cur=0.9.5 prev=0.8.12 burn_in_started=2026-06-03T02:38:57Z b cur=0.9.5 prev=None burn_in_started=2026-06-03T02:53:56Z c not provisioned ``` **Prod externally:** - `https://makenot.work/api/health` → `{"status":"operational","version":"0.9.5","checks":{"database":true}}`. - `/`, `/login`, `/pricing`, `/docs`, `/docs/economics`, `/docs/roadmap`, `/docs/tiers` → 200. - Git: `git ls-remote git@ssh.makenot.work:max/meta.git` → returns refs. **Prod internally:** - `systemctl status makenotwork` → active, PID 3123111, listening 0.0.0.0:3000. - 733 YARA rules compiled from `/opt/mnw/yara-rules` (symlink). - All integrations enabled per startup log: `s3=true, synckit_s3=false, stripe=true, scanner=true, mt=true, wam=true, git=true`. **deploy.sh path retained.** Not retired; remains as break-glass per `feedback_prefer_sando_over_deploy_sh` (sando is preferred *default*; deploy.sh stays runnable for outages where sando host is down). ## G. Open follow-ups ### G.1 The hardcoded `/opt/makenotwork/` paths (blocks the cleanup milestone) Session 1 outcomes claimed "`command=` prefixes auto-update on the first post-migration `rebuild-keys` run." That's wrong — confirmed during step 14. The path is a `const` in the binary, not pulled from env. Four sites need lifting before `/opt/makenotwork/` can be removed: | File | Line | Current value | Target | |---|---|---|---| | `server/src/git_ssh.rs` | 15 | `const MNW_ADMIN_PATH: &str = "/opt/makenotwork/mnw-admin"` | `/opt/mnw/current/mnw-admin` | | `server/src/bin/mnw-admin.rs` | 122 | `dotenvy::from_path("/opt/makenotwork/.env")` | `/etc/mnw/makenotwork.env` | | `server/src/build_runner.rs` | 467 | `const BUILD_SSH_KNOWN_HOSTS: &str = "/opt/makenotwork/ssh/known_hosts"` | `/etc/mnw/known_hosts` (or delete if dead — verify usage first) | | `server/src/routes/api/ssh_keys.rs` | 165 | `args(["-u", "git", "/opt/makenotwork/mnw-admin", "rebuild-keys"])` | `/opt/mnw/current/mnw-admin` | Ship as 0.9.6. Cleanup sequence after: deploy 0.9.6 via sando → `rebuild-keys` once (regenerates `authorized_keys` with new path in command=) → soak one week → `rm -rf /opt/makenotwork/`. ### G.2 The backups dir migration Independent of G.1. Touches: - `server/deploy/backup-db.sh` — hardcoded `BACKUP_DIR="/opt/makenotwork/backups"` near top. - `makenotwork` user crontab on prod. - Sando's `backup.source` URL on fw13 (currently pulls from `/opt/makenotwork/backups/latest.sql.gz` via rrsync). Easiest order: copy the existing 885M dir to `/var/lib/mnw/backups/`, edit script + crontab + sando config in one window, retire `/opt/makenotwork/backups/` after one successful daily backup lands in the new location and sando confirms it pulled cleanly. ### G.3 The `/opt/git` vs `/var/lib/mnw/git` duality Both directories currently hold the same repos. Git pushes land in `/opt/git/` (git user's home from `/etc/passwd`). Server reads from `/var/lib/mnw/git/` (GIT_REPOS_PATH). They drift the moment someone pushes. Two ways out: - (a) `usermod -d /var/lib/mnw/git git` to make git's home match GIT_REPOS_PATH. Single source of truth. Risk: any cron / script that reads git's home (none I found, but worth grepping) breaks. - (b) Revert GIT_REPOS_PATH to `/opt/git/`. Avoids the move but locks the path forever and reverts a piece of Session 1's FHS migration. (a) is the right answer. Do it during the post-0.9.6 soak window. ### G.4 `bootstrap-node.sh` polish From this cutover and Session 2: - **Detect `nologin` shell** on `SERVICE_USER` and refuse with a clear error (or auto-`chsh`). Costs ~1 min of cutover time if you don't know to check. - **Sibling `bootstrap-node-postgres.sh`** for the common pg_ident map case (when SERVICE_USER ≠ pg role name). Or document the manual steps in the script's "next steps" output. - **README-postgres.md note** on the sqlx URL form: `postgres:///db?host=/var/run/postgresql&user=name`, not `postgres://user@/db?host=...`. ### G.5 `ASSUMPTIONS_PATH` mismatch `sando-daemon.toml` puts the file at `/docs/assumptions.toml`; prod's pre-existing env expected `/docs/business/assumptions.toml` (matching the source layout `server/docs/business/assumptions.toml`). Worked around with an env edit during cutover but both prod and testnot now have non-canonical `ASSUMPTIONS_PATH=/opt/mnw/current/docs/assumptions.toml`. Fix: change `release_contents[3].dst` in `sando-daemon.toml` to `docs/business/assumptions.toml` and revert the env path on both nodes. Small, do it during the 0.9.6 sprint. ## H. Key paths (for orientation) - `MNW/sando/sando.toml` — tier B definition (`makenotwork@alpha-west-1`). - `MNW/sando/deploy/bootstrap-node.sh` — node-bootstrap; ran on prod with `SERVICE_USER=makenotwork`. - `MNW/sando/daemon/sando-daemon.toml` — release_contents (note §G.5 ASSUMPTIONS_PATH mismatch). - `MNW/server/src/{git_ssh.rs, build_runner.rs, bin/mnw-admin.rs, routes/api/ssh_keys.rs}` — the four hardcoded path sites. - `MNW/server/deploy/backup-db.sh` — hardcoded backup dir. - `/etc/systemd/system/makenotwork.service` (prod) — new FHS unit. - `/etc/mnw/makenotwork.env` (prod) — new env file location. - `/etc/sudoers.d/mnw-git-ssh` (prod) — updated to `/opt/mnw/current/mnw-admin`. - `/etc/caddy/Caddyfile` (prod) — three error-pages refs updated. - `/opt/makenotwork/` (prod) — full pre-cutover state, kept for soak rollback. - `launchplan_final.md` §6.5 step 8 — original plan this session closes. - `launchplan_final.md` §6.9 — Session 2/3 gotchas summary. - `launchplan_final.md` §7 — 0.9.6 path-decoupling spec.