Skip to main content

max / makenotwork

sando: MM runs Mountaineer; bake init-backend abstraction into Phase 1 Phase 0 now installs Mountaineer (Alpine + s6-rc + ZFS root) instead of a generic Linux. Calls out the fallback-to-Ubuntu trigger and procedure, plus musl-libc verification work for MNW server. Phase 1 splits the deploy backend by node.init = systemd|s6|local with an InitBackend trait, so MM-on-s6 and Hetzner-on-systemd are both addressable without forcing a prod-cutover decision now. Renames node.systemd_unit to node.service_name for backend independence.
Author: Max J. <87768334+MaxJMath@users.noreply.github.com> · 2026-05-23 02:37 UTC
Commit: 25a3563e6113d92ef36d8ebf0bb13b68901eb6ca
Parent: 8f502d8
1 file changed, +16 insertions, -10 deletions
M sando/todo.md +16 -10
@@ -30,23 +30,29 @@ Read these to orient before working on Sando:
30 30
31 31 Hardware and base provisioning. None of the remote-deploy work below matters until MM exists.
32 32
33 + **Platform decision: MM runs Mountaineer.** MM is the first real Mountaineer deployment and Sando is its first real sysop helper (principle 14). Hetzner prod stays on its current distro for now; the Mountaineer-for-prod question is deferred at least a year. If MM-on-Mountaineer ever blocks an MNW deploy for more than a day, fall back to Ubuntu on MM — capture the trigger in `plans/mm-platform-fallback.md` before flipping the install.
34 +
33 35 - [ ] Purchase MakeMachine hardware (Threadripper 7960X + RTX PRO 6000 Blackwell + 256 GB ECC + Gen5 NVMe; ~$14-16K per `project_inference_stack.md`).
34 - - [ ] Install x86_64 Linux (match Hetzner prod distro/version to keep build env aligned).
36 + - [ ] Install Mountaineer (ZFS root, s6+s6-rc init, nushell, podman). Use the latest Dull Edge build available, or hand-roll from `side_projects/mountaineer/` if no release has shipped yet.
37 + - [ ] Write `plans/mm-platform-fallback.md`: explicit trigger conditions for re-imaging MM with Ubuntu, plus the swap-in procedure (which env files, which binaries, which directories to preserve).
35 38 - [ ] Join MM to tailnet; allocate a stable hostname and record in `_meta/infra_tailnet.md`.
36 39 - [ ] Provision `sando` system user; lock down the home dir; set up scoped SSH keys for outbound deploys.
37 - - [ ] Install scratch Postgres locally on MM; create the `sando_scratch` role + DB used by `migration_dry_run`.
38 - - [ ] Write the `sandod.service` systemd unit (run as `sando` user, restart on failure, `EnvironmentFile=/etc/sando/sando.env`).
39 - - [ ] Install `sandod` binary at `/usr/local/bin/sandod`; enable + start the unit.
40 - - [ ] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`; A node `testnot.work`; B node Hetzner prod.
40 + - [ ] Install scratch Postgres locally on MM (via apk); create the `sando_scratch` role + DB used by `migration_dry_run`.
41 + - [ ] Write Sando's s6-rc service definition (`sandod` long-run service, dependency on tailscale and postgres, restart on failure, env from `/etc/sando/sando.env`). Contribute upstream to Alpine if the definition turns out general enough — see Mountaineer principle on giving back.
42 + - [ ] Install `sandod` binary at `/usr/local/bin/sandod`; bring up the s6 service.
43 + - [ ] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`; A node `testnot.work`; B node Hetzner prod. Use `node.init = "systemd"` for the Hetzner nodes (see Phase 1).
44 + - [ ] Verify MNW server builds reproducibly on Mountaineer (musl libc vs glibc — sqlx/tokio/axum should be fine but confirm before relying on it). Capture any musl-specific surprises in `plans/mm-build-notes.md`.
41 45
42 46 ## Phase 1 — Remote deploy
43 47
44 - The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync.
48 + The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync, and the init-system split (MM on s6, Hetzner on systemd) needs a backend abstraction from day one.
45 49
46 - - [ ] Implement `deploy::deploy_node` remote path: rsync the staged binary to `<ssh_target>:<release_root>/releases/<version>/server`, then `ssh <ssh_target> "ln -sfn releases/<version> current && systemctl reload-or-restart <unit>"`.
47 - - [ ] Settle systemd unit naming convention. Current MNW server unit is `makenotwork.service`; decide whether Sando keeps that name or migrates to `mnw-server.service`. Capture in `plans/systemd-units.md` before changing anything live.
48 - - [ ] Add `node.systemd_unit` field to `sando.toml` (default derives from the tier+role) so the convention is explicit per-node.
49 - - [ ] Bootstrap script for adding a fresh node: creates `<release_root>`, installs the systemd unit pointing at `<release_root>/current/server`, adds the sando SSH key to `authorized_keys`. Idempotent.
50 + - [ ] Add `node.init` field to `sando.toml`: `"systemd" | "s6" | "local"`. Default `"systemd"` for backwards-compat. Every node declares its init explicitly so a future Hetzner-on-Mountaineer move is a TOML edit, not a Sando code change.
51 + - [ ] Refactor `deploy.rs` around an `InitBackend` trait with `reload_or_restart(unit_name) -> Result<()>` and `unit_path(release_root, version) -> PathBuf`. Two impls: `Systemd` (shells `systemctl reload-or-restart`) and `S6` (shells `s6-svc -r` against the service dir). `Local` impl is a no-op restart for dev.
52 + - [ ] Implement `deploy::deploy_node` remote path: rsync the staged binary to `<ssh_target>:<release_root>/releases/<version>/server`, then `ssh <ssh_target>` runs `ln -sfn releases/<version> current` plus the init-backend-appropriate reload.
53 + - [ ] Settle service-name convention. Current MNW server systemd unit is `makenotwork.service`; on s6 it would be `/etc/s6-rc/sv/mnw-server/`. Capture both names + the migration plan in `plans/service-names.md` before changing anything live.
54 + - [ ] Add `node.service_name` field to `sando.toml` (default derives from tier+role) so the convention is explicit per-node and backend-agnostic.
55 + - [ ] Bootstrap script for adding a fresh node: creates `<release_root>`, installs the init-backend-appropriate service definition pointing at `<release_root>/current/server`, adds the sando SSH key to `authorized_keys`. Idempotent. One script per backend, or one script that branches on init kind.
50 56 - [ ] Garbage-collect old releases on the remote: keep last N (configurable, default 5) per node. Run at end of each successful deploy.
51 57 - [ ] Handle `rsync` failure mid-deploy: leave the previous `current` symlink intact; mark `deploys.outcome = 'failed'`; do not advance `tier_state`.
52 58