max / makenotwork
1 file changed,
+16 insertions,
-10 deletions
| @@ -30,23 +30,29 @@ Read these to orient before working on Sando: | |||
| 30 | 30 | ||
| 31 | 31 | Hardware and base provisioning. None of the remote-deploy work below matters until MM exists. | |
| 32 | 32 | ||
| 33 | + | **Platform decision: MM runs Mountaineer.** MM is the first real Mountaineer deployment and Sando is its first real sysop helper (principle 14). Hetzner prod stays on its current distro for now; the Mountaineer-for-prod question is deferred at least a year. If MM-on-Mountaineer ever blocks an MNW deploy for more than a day, fall back to Ubuntu on MM — capture the trigger in `plans/mm-platform-fallback.md` before flipping the install. | |
| 34 | + | ||
| 33 | 35 | - [ ] Purchase MakeMachine hardware (Threadripper 7960X + RTX PRO 6000 Blackwell + 256 GB ECC + Gen5 NVMe; ~$14-16K per `project_inference_stack.md`). | |
| 34 | - | - [ ] Install x86_64 Linux (match Hetzner prod distro/version to keep build env aligned). | |
| 36 | + | - [ ] Install Mountaineer (ZFS root, s6+s6-rc init, nushell, podman). Use the latest Dull Edge build available, or hand-roll from `side_projects/mountaineer/` if no release has shipped yet. | |
| 37 | + | - [ ] Write `plans/mm-platform-fallback.md`: explicit trigger conditions for re-imaging MM with Ubuntu, plus the swap-in procedure (which env files, which binaries, which directories to preserve). | |
| 35 | 38 | - [ ] Join MM to tailnet; allocate a stable hostname and record in `_meta/infra_tailnet.md`. | |
| 36 | 39 | - [ ] Provision `sando` system user; lock down the home dir; set up scoped SSH keys for outbound deploys. | |
| 37 | - | - [ ] Install scratch Postgres locally on MM; create the `sando_scratch` role + DB used by `migration_dry_run`. | |
| 38 | - | - [ ] Write the `sandod.service` systemd unit (run as `sando` user, restart on failure, `EnvironmentFile=/etc/sando/sando.env`). | |
| 39 | - | - [ ] Install `sandod` binary at `/usr/local/bin/sandod`; enable + start the unit. | |
| 40 | - | - [ ] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`; A node `testnot.work`; B node Hetzner prod. | |
| 40 | + | - [ ] Install scratch Postgres locally on MM (via apk); create the `sando_scratch` role + DB used by `migration_dry_run`. | |
| 41 | + | - [ ] Write Sando's s6-rc service definition (`sandod` long-run service, dependency on tailscale and postgres, restart on failure, env from `/etc/sando/sando.env`). Contribute upstream to Alpine if the definition turns out general enough — see Mountaineer principle on giving back. | |
| 42 | + | - [ ] Install `sandod` binary at `/usr/local/bin/sandod`; bring up the s6 service. | |
| 43 | + | - [ ] Write the production `sando.toml`; bare repo path under `/srv/sando/mnw.git`; A node `testnot.work`; B node Hetzner prod. Use `node.init = "systemd"` for the Hetzner nodes (see Phase 1). | |
| 44 | + | - [ ] Verify MNW server builds reproducibly on Mountaineer (musl libc vs glibc — sqlx/tokio/axum should be fine but confirm before relying on it). Capture any musl-specific surprises in `plans/mm-build-notes.md`. | |
| 41 | 45 | ||
| 42 | 46 | ## Phase 1 — Remote deploy | |
| 43 | 47 | ||
| 44 | - | The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync. | |
| 48 | + | The MVP only deploys to `ssh_target=local`. Production needs real SSH/rsync, and the init-system split (MM on s6, Hetzner on systemd) needs a backend abstraction from day one. | |
| 45 | 49 | ||
| 46 | - | - [ ] Implement `deploy::deploy_node` remote path: rsync the staged binary to `<ssh_target>:<release_root>/releases/<version>/server`, then `ssh <ssh_target> "ln -sfn releases/<version> current && systemctl reload-or-restart <unit>"`. | |
| 47 | - | - [ ] Settle systemd unit naming convention. Current MNW server unit is `makenotwork.service`; decide whether Sando keeps that name or migrates to `mnw-server.service`. Capture in `plans/systemd-units.md` before changing anything live. | |
| 48 | - | - [ ] Add `node.systemd_unit` field to `sando.toml` (default derives from the tier+role) so the convention is explicit per-node. | |
| 49 | - | - [ ] Bootstrap script for adding a fresh node: creates `<release_root>`, installs the systemd unit pointing at `<release_root>/current/server`, adds the sando SSH key to `authorized_keys`. Idempotent. | |
| 50 | + | - [ ] Add `node.init` field to `sando.toml`: `"systemd" | "s6" | "local"`. Default `"systemd"` for backwards-compat. Every node declares its init explicitly so a future Hetzner-on-Mountaineer move is a TOML edit, not a Sando code change. | |
| 51 | + | - [ ] Refactor `deploy.rs` around an `InitBackend` trait with `reload_or_restart(unit_name) -> Result<()>` and `unit_path(release_root, version) -> PathBuf`. Two impls: `Systemd` (shells `systemctl reload-or-restart`) and `S6` (shells `s6-svc -r` against the service dir). `Local` impl is a no-op restart for dev. | |
| 52 | + | - [ ] Implement `deploy::deploy_node` remote path: rsync the staged binary to `<ssh_target>:<release_root>/releases/<version>/server`, then `ssh <ssh_target>` runs `ln -sfn releases/<version> current` plus the init-backend-appropriate reload. | |
| 53 | + | - [ ] Settle service-name convention. Current MNW server systemd unit is `makenotwork.service`; on s6 it would be `/etc/s6-rc/sv/mnw-server/`. Capture both names + the migration plan in `plans/service-names.md` before changing anything live. | |
| 54 | + | - [ ] Add `node.service_name` field to `sando.toml` (default derives from tier+role) so the convention is explicit per-node and backend-agnostic. | |
| 55 | + | - [ ] Bootstrap script for adding a fresh node: creates `<release_root>`, installs the init-backend-appropriate service definition pointing at `<release_root>/current/server`, adds the sando SSH key to `authorized_keys`. Idempotent. One script per backend, or one script that branches on init kind. | |
| 50 | 56 | - [ ] Garbage-collect old releases on the remote: keep last N (configurable, default 5) per node. Run at end of each successful deploy. | |
| 51 | 57 | - [ ] Handle `rsync` failure mid-deploy: leave the previous `current` symlink intact; mark `deploys.outcome = 'failed'`; do not advance `tier_state`. | |
| 52 | 58 |