| 1 |
# Session 1 — Sando ships the full versioned bundle |
| 2 |
|
| 3 |
Plan captured 2026-06-02 after the design-step-back conversation following Phase A landing and the cargo_test/MM diagnosis push. Resolves the tier-B-strategy decision (§6.5 step 6 of `launchplan_final.md`) by going past the (a1)/(a2)/(b) trichotomy to a proper layout redesign. |
| 4 |
|
| 5 |
Status: **complete 2026-06-02**. See §G "Outcomes" at the bottom for what actually shipped, the bundle that landed, and what changed vs. the plan. |
| 6 |
|
| 7 |
## Background — the trade as it stood |
| 8 |
|
| 9 |
Today sando ships only `{makenotwork, mnw-admin, error-pages}` into `releases/<v>/`. Prod's `/opt/makenotwork/` also contains `docs/`, `static/`, `yara-rules/`, `.env`, `backups/`, `scan-spool/` — none of which sando manages. `deploy.sh` ships content; sando ships binaries. The split is the source of the friction we keep hitting. |
| 10 |
|
| 11 |
The right answer isn't (a1) "extend sando to ship everything in one piece" or (a2) "leave content on the old path forever" — it's a layout that separates the five things prod actually serves and lets sando own exactly the right subset. |
| 12 |
|
| 13 |
## Layout |
| 14 |
|
| 15 |
``` |
| 16 |
/opt/mnw/releases/<v>/ # sando-managed: code + version-coupled content |
| 17 |
├── makenotwork |
| 18 |
├── mnw-admin |
| 19 |
├── static/ |
| 20 |
├── docs/ |
| 21 |
└── error-pages/ |
| 22 |
/opt/mnw/current → releases/<v>/ # atomic-swap symlink |
| 23 |
/opt/mnw/yara-rules/ # operator-managed (separate cadence) |
| 24 |
/etc/mnw/makenotwork.env # operator-managed secrets |
| 25 |
/var/lib/mnw/ # runtime state |
| 26 |
├── backups/ |
| 27 |
├── scan-spool/ |
| 28 |
└── git/ # GIT_REPOS_PATH (moved from /opt/git) |
| 29 |
``` |
| 30 |
|
| 31 |
Systemd unit: |
| 32 |
|
| 33 |
``` |
| 34 |
ExecStart=/opt/mnw/current/makenotwork |
| 35 |
WorkingDirectory=/opt/mnw/current |
| 36 |
EnvironmentFile=/etc/mnw/makenotwork.env |
| 37 |
ReadWritePaths=/var/lib/mnw /opt/mnw/yara-rules |
| 38 |
StateDirectory=mnw/scan-spool |
| 39 |
``` |
| 40 |
|
| 41 |
## Principle |
| 42 |
|
| 43 |
A sando "release" is the atomic versioned bundle in category #2 of the inventory below — code + version-coupled content. Anything that doesn't change in lockstep with the binary doesn't belong inside the release dir. Secrets and state live on paths sando never touches. This is what gives atomic rollback its actual value: one symlink swap moves every version-coupled thing together. |
| 44 |
|
| 45 |
### The five categories (what prod actually serves) |
| 46 |
|
| 47 |
1. **Versioned code** — `makenotwork`, `mnw-admin`. Tied to a git sha. |
| 48 |
2. **Versioned content compiled-against-the-binary** — `static/` (cache-busted via compiled-in version), `error-pages/`, `docs/` (DocEngine content, askama-template-coupled). Skew between these and the binary is a bug class. |
| 49 |
3. **Versioned content with its own update cadence** — `yara-rules/` (malware-scan rules, gated by SCAN_ENABLED). Updates independently of MNW releases. |
| 50 |
4. **Secrets / config** — `.env`, 45 keys. Outlives every version; never in the repo. |
| 51 |
5. **Runtime state** — `backups/`, `scan-spool/`, `git/`. Lives outside any version. |
| 52 |
|
| 53 |
Sando owns 1+2 as one atomic bundle. 3 stays operator-managed (revisit later if a real incident makes atomic yara-rules rollback necessary). 4 and 5 are not sando's concern. |
| 54 |
|
| 55 |
## Session goal |
| 56 |
|
| 57 |
Sando produces a staged release dir with the full 1+2 bundle on the MM tier. No prod or testnot changes. Validate by `POST /rebuild` against MNW main and inspecting `/srv/sando/releases/<v>/`. |
| 58 |
|
| 59 |
## A. Investigate first (~30 min) |
| 60 |
|
| 61 |
1. `MNW/shared/docengine/` — find the build entry point. Is it a `cargo build -p docengine` producing a binary? A `build.rs` step in `server/`? A separate `docengine compile` command run over `site-docs/`? Determines whether sando's `build.rs` invokes cargo once or twice. If DocEngine compile is a cargo step, it falls out of the existing `cargo build --release`; if it's a separate binary that runs over `site-docs/`, sando needs to invoke it after the cargo build. |
| 62 |
2. Confirm `server/static/` is real static (no preprocessing). Probably true — server/CONTRIBUTING.md notes "New JS goes in server/static/ as static files". |
| 63 |
3. `mnw-admin` callers — grep `mnw-admin` across `server/` to confirm what invokes it. Per the user: build scripts + possibly the admin page. Those callers need to update to `/opt/mnw/current/mnw-admin` post-migration. Not a Session 1 change, just a note for Session 3. |
| 64 |
|
| 65 |
## B. Sando code changes (`MNW/sando/daemon/src/`) |
| 66 |
|
| 67 |
### `build.rs::build_and_run_mm` |
| 68 |
|
| 69 |
Currently stages `<release>/{makenotwork, mnw-admin, error-pages/}` via `deploy::deploy_local` (binaries) + a `cp -a` for error-pages. Extend to also stage: |
| 70 |
|
| 71 |
- `<release>/static/` ← `worktree/server/static/` (cp -a) |
| 72 |
- `<release>/docs/` ← `worktree/server/target/release/docs/` (assuming DocEngine outputs there) or `worktree/server/site-docs/` raw, depending on §A.1. |
| 73 |
|
| 74 |
Refactor: pull the per-asset `cp -a` calls into a single helper |
| 75 |
|
| 76 |
```rust |
| 77 |
fn stage_dir(src: &Path, dst: &Path, required: bool) -> Result<()> { ... } |
| 78 |
``` |
| 79 |
|
| 80 |
so the four sites (error-pages, static, docs, future) read uniformly. Missing-source policy: required=true errors; required=false logs warn and skips (so older shas without one of these don't break sando mid-bisect). |
| 81 |
|
| 82 |
### `config.rs` |
| 83 |
|
| 84 |
Replace the `bin_names: Vec<String>` "primary binary" framing with two fields: |
| 85 |
|
| 86 |
```rust |
| 87 |
pub bin_names: Vec<String>, // what cargo produces under target/release/ |
| 88 |
pub release_contents: Vec<ReleaseEntry>, // what gets staged from worktree → release dir |
| 89 |
|
| 90 |
pub struct ReleaseEntry { |
| 91 |
pub src: PathBuf, // relative to worktree root |
| 92 |
pub dst: PathBuf, // relative to release dir |
| 93 |
pub required: bool, |
| 94 |
} |
| 95 |
``` |
| 96 |
|
| 97 |
Default value for MNW lives in `sando-daemon.toml`, not hard-coded in sando: |
| 98 |
|
| 99 |
```toml |
| 100 |
release_contents = [ |
| 101 |
{ src = "server/deploy/error-pages", dst = "error-pages", required = false }, |
| 102 |
{ src = "server/static", dst = "static", required = true }, |
| 103 |
{ src = "server/target/release/docs", dst = "docs", required = true }, |
| 104 |
] |
| 105 |
``` |
| 106 |
|
| 107 |
This pulls MNW-specific knowledge out of sando code into sando config — closer to what sando wants to be as a generic deploy controller. Tests can override with a fixture release_contents. |
| 108 |
|
| 109 |
### `deploy.rs::deploy_node` |
| 110 |
|
| 111 |
Already rsyncs the whole staged dir, so no changes once `build.rs` stages more into it. Verify rsync flags include `-a` and `--delete` so removed assets don't accumulate across versions (e.g. a deleted static asset). |
| 112 |
|
| 113 |
### `bootstrap-node.sh` |
| 114 |
|
| 115 |
Unit template now writes: |
| 116 |
|
| 117 |
``` |
| 118 |
ExecStart=/opt/mnw/current/makenotwork |
| 119 |
WorkingDirectory=/opt/mnw/current |
| 120 |
EnvironmentFile=/etc/mnw/makenotwork.env |
| 121 |
ReadWritePaths=/var/lib/mnw /opt/mnw/yara-rules |
| 122 |
StateDirectory=mnw/scan-spool |
| 123 |
``` |
| 124 |
|
| 125 |
Plus `install -d -o root -g $SERVICE_USER -m 0750 /etc/mnw /var/lib/mnw` upfront. Drop the `EnvironmentFile=-/opt/mnw/.env` default — env file lives at `/etc/mnw/makenotwork.env` now. |
| 126 |
|
| 127 |
### Tier rename `mm` → `host` |
| 128 |
|
| 129 |
Folded in here while editing topology code. Schema migration in `sando/daemon/migrations/` renames the `tiers` row + any FK references in `tier_state`, `gate_runs`, `deploys`. Plus code sweep: `build.rs`, `routes.rs`, `sando.toml`, test fixtures. Add a defensive assert that loudly fails if `tier='mm'` is still queried anywhere — silent miss would be worse than a panic during the transition. |
| 130 |
|
| 131 |
~30 lines of grep+replace + 1 sqlite migration. |
| 132 |
|
| 133 |
## C. Test on MM |
| 134 |
|
| 135 |
1. `cargo test --release --features fast-tests` in `sando/daemon/` — all existing tests pass + any new staging tests added (e.g. `stage_dir copies on success`, `stage_dir errors when required-missing`, `release_contents config parses`). |
| 136 |
2. Build sandod, install, restart on fw13. |
| 137 |
3. `POST /rebuild` against current MNW main. |
| 138 |
4. Inspect `/srv/sando/releases/<v>/` — should contain `makenotwork`, `mnw-admin`, `error-pages/`, `static/`, `docs/`. Verify total size is sane (single-digit MB for binary, tens of MB for static+docs). |
| 139 |
5. boot_smoke still passes (binary doesn't care what's in the dir alongside it). |
| 140 |
6. cargo_test still green. |
| 141 |
|
| 142 |
## D. Acceptance |
| 143 |
|
| 144 |
- A `/rebuild` against MNW main produces a staged release with all four asset categories (binaries, error-pages, static, docs). |
| 145 |
- Existing MM gates stay green. |
| 146 |
- `/srv/sando/releases/<v>/` is self-contained: deleting `/srv/sando/work/<sha>/` and re-running boot_smoke against the staged binary works. Sanity check that no path in the staged tree reaches back into the worktree. |
| 147 |
|
| 148 |
## E. Out of scope for Session 1 (Session 2/3) |
| 149 |
|
| 150 |
- Any change to testnot or prod |
| 151 |
- Moving `/opt/git`, `/opt/makenotwork/.env`, state dirs |
| 152 |
- Authorizing sando's pubkey on prod |
| 153 |
- Renaming the systemd service path |
| 154 |
- yara-rules — stays operator-managed; out of sando entirely |
| 155 |
- Caddy config (operator-managed; uses bundle path indirectly via `localhost:3000` reverse_proxy + per-dir paths that point at `/opt/mnw/current/...` post-migration) |
| 156 |
|
| 157 |
## F. Open questions to answer during Session 1 |
| 158 |
|
| 159 |
- **Total bundle size after staging.** If it's >50MB the rsync time per deploy gets noticeable. Worth measuring; not blocking. Sets expectations for Session 2/3. |
| 160 |
- **DocEngine build path.** Whether `cargo build --release` already produces `target/release/docs/` (if DocEngine is a build-script step) or needs an explicit `cargo run -p docengine -- build` (if it's a separate command). Determines whether `build.rs` in sando does one cargo invocation or two. |
| 161 |
- **DocEngine output format.** Is `target/release/docs/` a directory tree of HTML/assets ready to serve, or a single bundle file? Affects `stage_dir` semantics. |
| 162 |
|
| 163 |
## Sessions 2 and 3 (out of scope but for context) |
| 164 |
|
| 165 |
**Session 2 — testnot migration (low-risk practice).** |
| 166 |
- `bootstrap-node.sh` on testnot with the new unit shape. |
| 167 |
- Write `/etc/mnw/makenotwork.env` on testnot from scratch — this is what was missing during the 2026-06-02 tier-A exercise attempt. |
| 168 |
- `POST /promote/a` from sando → boots green for the first time. |
| 169 |
- Exercise §6.5 step 3 tier-A flow we skipped. |
| 170 |
|
| 171 |
**Session 3 — prod migration (the careful one).** |
| 172 |
- Inventory + dry-run plan (most of inventory done 2026-06-02; one more pass for exact mv/install sequence). |
| 173 |
- Lock the deploy.sh path during the migration window. |
| 174 |
- Stop makenotwork.service. |
| 175 |
- `bootstrap-node.sh` on prod with `SANDO_PUBKEY=…`, creating `deploy` user + `/opt/mnw/` + new unit. |
| 176 |
- `mv /opt/makenotwork/.env /etc/mnw/makenotwork.env`. |
| 177 |
- `mv /opt/makenotwork/{backups,scan-spool}` → `/var/lib/mnw/` (or rebuild scan-spool — it's transient). |
| 178 |
- `mv /opt/git /var/lib/mnw/git` + update `GIT_REPOS_PATH` in `/etc/mnw/makenotwork.env`. |
| 179 |
- Audit all PATH-typed env keys against the new layout: `DOCS_PATH=/opt/mnw/current/docs`, `YARA_RULES_DIR=/opt/mnw/yara-rules`, `ASSUMPTIONS_PATH=/opt/mnw/current/docs/business/assumptions.toml`, etc. |
| 180 |
- Update build scripts on prod that invoke `mnw-admin` to use `/opt/mnw/current/mnw-admin`. |
| 181 |
- `POST /promote/b {"hotfix":true}` from sando → first sando deploy to prod. |
| 182 |
- Start makenotwork.service under the new layout. |
| 183 |
- Verify makenot.work end-to-end. |
| 184 |
- Soak for a week. |
| 185 |
- `rm -rf /opt/makenotwork/` after the soak; archive `deploy.sh` as break-glass-only. |
| 186 |
|
| 187 |
## Key paths (for Claude orientation) |
| 188 |
|
| 189 |
- `MNW/sando/daemon/src/build.rs` — where staging happens (`build_and_run_mm`). |
| 190 |
- `MNW/sando/daemon/src/config.rs` — Config struct, `bin_names`. |
| 191 |
- `MNW/sando/daemon/src/deploy.rs` — `deploy_local`, `deploy_node`. |
| 192 |
- `MNW/sando/deploy/bootstrap-node.sh` — unit template. |
| 193 |
- `MNW/sando/sando.toml` — topology config (tier names live here). |
| 194 |
- `MNW/sando/daemon/sando-daemon.toml` — daemon config (release_contents will go here). |
| 195 |
- `MNW/sando/daemon/migrations/` — sqlite migrations for the mm→host rename. |
| 196 |
- `MNW/server/static/`, `MNW/server/site-docs/`, `MNW/server/deploy/error-pages/` — staging sources. |
| 197 |
- `MNW/shared/docengine/` — DocEngine crate (investigate in §A.1). |
| 198 |
- `launchplan_final.md` §6.5 — original tier-B decision context this redesign supersedes. |
| 199 |
- `MNW/sando/plans/config-artifacts.md` — earlier Phase 3 design doc on config vs binary artifacts; complementary background. |
| 200 |
|
| 201 |
## G. Outcomes (2026-06-02) |
| 202 |
|
| 203 |
Session 1 landed in one focused push. All 10 tasks done, 44/44 sando-daemon tests green, pipeline went `host` green end-to-end against sha `f0970b8` (version 0.9.5) on fw13. |
| 204 |
|
| 205 |
### What shipped (commit `f0970b8` on sando bare + mnw + srht remotes) |
| 206 |
|
| 207 |
- `release_contents: Vec<ReleaseEntry>` in `Config`, with `ReleaseEntry { src, dst, required }`. Sando code carries no MNW-specific knowledge; the MNW bundle shape lives in `sando-daemon.toml`. |
| 208 |
- `build.rs::build_and_run_host` (renamed from `_mm`) iterates `cfg.release_contents`, calling `stage_entry()` per row. `cp -a` semantics; supports the merge-into-existing-dir form so multiple entries can target the same `dst` (used for `docs/` from 3 worktree sources). |
| 209 |
- `deploy.rs` rsync gained `--delete` (no stale assets across versions) and swapped `--chmod=F0755` for `--chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r,F+X` (binaries 0755, data files 0644). |
| 210 |
- `bootstrap-node.sh` writes FHS-style unit: `EnvironmentFile=/etc/mnw/makenotwork.env`, `ReadWritePaths=/var/lib/mnw`, `WorkingDirectory=<release>/current`. Pre-creates `/etc/mnw` (root:service 0750) + `/var/lib/mnw` (service:service 0750). |
| 211 |
- Migration `002_rename_mm_to_host.sql` — `PRAGMA defer_foreign_keys = ON` + 5 UPDATEs (tiers, nodes, deploys, gate_runs, tier_state). Preserved all existing state on fw13 (host current=0.9.5 + a current=0.8.12 carried through). |
| 212 |
- Post-receive hook now lives in repo at `sando/deploy/post-receive` and sources `/etc/sando/sando.env` — `SANDO_DAEMON` resolves to the tailnet listener instead of the 127.0.0.1 default. `bootstrap-sandod-host.sh` installs it. |
| 213 |
|
| 214 |
### Open-question answers from §F |
| 215 |
|
| 216 |
- **Bundle size:** 154 MB total. 133 MB makenotwork + 25 MB mnw-admin + <1 MB of error-pages/static/docs combined. Non-binary content is rounding error against the binaries; rsync over the LAN+tailnet to testnot/prod will be dominated by binary delta (rsync's algorithm helps here — minor code changes ship as small deltas, not 158 MB). |
| 217 |
- **DocEngine build path:** It's a library crate (`MNW/shared/docengine/`), not a binary. No separate build step. Content stays raw markdown at `server/site-docs/` + `server/docs/business/assumptions.toml`. Three `release_contents` rows merge them into one `docs/` dir. |
| 218 |
- **DocEngine output format:** N/A — content never compiles. Raw `.md` files; the running server reads them at request time via the env-configured paths. |
| 219 |
|
| 220 |
### Surprises / unplanned discoveries |
| 221 |
|
| 222 |
- **deploy.sh has a CSS minification step** (`npx clean-css-cli`) before rsync. Sando does not. Effect: bundle ships unminified CSS (~3x larger on the wire than deploy.sh-shipped CSS). `server/build.rs` hashes the *unminified* `style.css` for the cache-bust `?v=...`, so correctness is preserved — purely a size issue. Future fix: either eat the size cost (gzip handles most of it), move minification into `server/build.rs`, or add a build-step gate to sando. **Not addressed in Session 1.** |
| 223 |
- **mnw-admin invocation surface is bigger than expected.** Live call sites on prod: (1) sudoers `/etc/sudoers.d/*` entry `makenotwork ALL=(git) NOPASSWD: /opt/makenotwork/mnw-admin rebuild-keys` — needs path update in Session 3; (2) `command=` prefixes in `/home/git/.ssh/authorized_keys` that `mnw-admin rebuild-keys` itself generates — auto-update on the first post-migration rebuild-keys run. Session 3 sequence: edit the sudoers file first, then run `mnw-admin rebuild-keys` once. |
| 224 |
- **The defensive assert-on-stray-`"mm"`-lookup proposed in the plan was skipped.** Tests catch it: any unrenamed site fails when the DB no longer has a row matching it. After the rename + sync.rs test run + the production restart on fw13, no "mm" lookups remained. |
| 225 |
|
| 226 |
### Carry-over for Session 2 |
| 227 |
|
| 228 |
The Session 2 starting point shifted slightly because we did Session 1's prep + ran it in one push. State to assume when Session 2 begins: |
| 229 |
|
| 230 |
- `f0970b8` is the active sha on sando's bare repo and is current on tier host. Tier a is on the stale `0.8.12` from pre-Session-1. |
| 231 |
- testnot still has the unit shape from the *pre-Session-1* `bootstrap-node.sh`. It is in a crashloop (MissingDatabaseUrl, no env file). Session 2 reprovisioning will replace its systemd unit with the new FHS shape AND populate `/etc/mnw/makenotwork.env` from scratch. |
| 232 |
- `bootstrap-sandod-host.sh` on fw13 is the new version; re-running it is idempotent. |
| 233 |
- Sando's pubkey on testnot under the `deploy` user: confirmed working earlier (`sudo -u sando ssh deploy@testnot` returned). No re-auth needed. |
| 234 |
- The bundle has not yet been deployed remotely. Tier a's `0.8.12` deploy predates `release_contents`; the testnot release dir contains only binaries. First Session 2 promotion to tier a will be the first remote deploy of the full bundle. |
| 235 |
|
| 236 |
### Things Session 2 should re-check before promoting |
| 237 |
|
| 238 |
- Verify `/etc/mnw/` and `/var/lib/mnw/` get created with the right ownership when `bootstrap-node.sh` runs on testnot (the new code path; never tested on a real box). |
| 239 |
- Decide what testnot's env file looks like — full prod-clone (with real Stripe test keys) or minimal (just what `Config::from_env` requires to boot). Minimal is faster and validates the deploy path; prod-clone exercises more code paths. |
| 240 |
- Confirm sando's pubkey is in `/home/deploy/.ssh/authorized_keys` on testnot, not just routable via Tailscale SSH. (Tailscale SSH ≠ pubkey auth; sando uses pubkey-only via OpenSSH.) |
| 241 |
|
| 242 |
|