Skip to main content

max / makenotwork

17.0 KB · 242 lines History Blame Raw
1 # Session 1 — Sando ships the full versioned bundle
2
3 Plan captured 2026-06-02 after the design-step-back conversation following Phase A landing and the cargo_test/MM diagnosis push. Resolves the tier-B-strategy decision (§6.5 step 6 of `launchplan_final.md`) by going past the (a1)/(a2)/(b) trichotomy to a proper layout redesign.
4
5 Status: **complete 2026-06-02**. See §G "Outcomes" at the bottom for what actually shipped, the bundle that landed, and what changed vs. the plan.
6
7 ## Background — the trade as it stood
8
9 Today sando ships only `{makenotwork, mnw-admin, error-pages}` into `releases/<v>/`. Prod's `/opt/makenotwork/` also contains `docs/`, `static/`, `yara-rules/`, `.env`, `backups/`, `scan-spool/` — none of which sando manages. `deploy.sh` ships content; sando ships binaries. The split is the source of the friction we keep hitting.
10
11 The right answer isn't (a1) "extend sando to ship everything in one piece" or (a2) "leave content on the old path forever" — it's a layout that separates the five things prod actually serves and lets sando own exactly the right subset.
12
13 ## Layout
14
15 ```
16 /opt/mnw/releases/<v>/ # sando-managed: code + version-coupled content
17 ├── makenotwork
18 ├── mnw-admin
19 ├── static/
20 ├── docs/
21 └── error-pages/
22 /opt/mnw/current → releases/<v>/ # atomic-swap symlink
23 /opt/mnw/yara-rules/ # operator-managed (separate cadence)
24 /etc/mnw/makenotwork.env # operator-managed secrets
25 /var/lib/mnw/ # runtime state
26 ├── backups/
27 ├── scan-spool/
28 └── git/ # GIT_REPOS_PATH (moved from /opt/git)
29 ```
30
31 Systemd unit:
32
33 ```
34 ExecStart=/opt/mnw/current/makenotwork
35 WorkingDirectory=/opt/mnw/current
36 EnvironmentFile=/etc/mnw/makenotwork.env
37 ReadWritePaths=/var/lib/mnw /opt/mnw/yara-rules
38 StateDirectory=mnw/scan-spool
39 ```
40
41 ## Principle
42
43 A sando "release" is the atomic versioned bundle in category #2 of the inventory below — code + version-coupled content. Anything that doesn't change in lockstep with the binary doesn't belong inside the release dir. Secrets and state live on paths sando never touches. This is what gives atomic rollback its actual value: one symlink swap moves every version-coupled thing together.
44
45 ### The five categories (what prod actually serves)
46
47 1. **Versioned code**`makenotwork`, `mnw-admin`. Tied to a git sha.
48 2. **Versioned content compiled-against-the-binary**`static/` (cache-busted via compiled-in version), `error-pages/`, `docs/` (DocEngine content, askama-template-coupled). Skew between these and the binary is a bug class.
49 3. **Versioned content with its own update cadence**`yara-rules/` (malware-scan rules, gated by SCAN_ENABLED). Updates independently of MNW releases.
50 4. **Secrets / config**`.env`, 45 keys. Outlives every version; never in the repo.
51 5. **Runtime state**`backups/`, `scan-spool/`, `git/`. Lives outside any version.
52
53 Sando owns 1+2 as one atomic bundle. 3 stays operator-managed (revisit later if a real incident makes atomic yara-rules rollback necessary). 4 and 5 are not sando's concern.
54
55 ## Session goal
56
57 Sando produces a staged release dir with the full 1+2 bundle on the MM tier. No prod or testnot changes. Validate by `POST /rebuild` against MNW main and inspecting `/srv/sando/releases/<v>/`.
58
59 ## A. Investigate first (~30 min)
60
61 1. `MNW/shared/docengine/` — find the build entry point. Is it a `cargo build -p docengine` producing a binary? A `build.rs` step in `server/`? A separate `docengine compile` command run over `site-docs/`? Determines whether sando's `build.rs` invokes cargo once or twice. If DocEngine compile is a cargo step, it falls out of the existing `cargo build --release`; if it's a separate binary that runs over `site-docs/`, sando needs to invoke it after the cargo build.
62 2. Confirm `server/static/` is real static (no preprocessing). Probably true — server/CONTRIBUTING.md notes "New JS goes in server/static/ as static files".
63 3. `mnw-admin` callers — grep `mnw-admin` across `server/` to confirm what invokes it. Per the user: build scripts + possibly the admin page. Those callers need to update to `/opt/mnw/current/mnw-admin` post-migration. Not a Session 1 change, just a note for Session 3.
64
65 ## B. Sando code changes (`MNW/sando/daemon/src/`)
66
67 ### `build.rs::build_and_run_mm`
68
69 Currently stages `<release>/{makenotwork, mnw-admin, error-pages/}` via `deploy::deploy_local` (binaries) + a `cp -a` for error-pages. Extend to also stage:
70
71 - `<release>/static/``worktree/server/static/` (cp -a)
72 - `<release>/docs/``worktree/server/target/release/docs/` (assuming DocEngine outputs there) or `worktree/server/site-docs/` raw, depending on §A.1.
73
74 Refactor: pull the per-asset `cp -a` calls into a single helper
75
76 ```rust
77 fn stage_dir(src: &Path, dst: &Path, required: bool) -> Result<()> { ... }
78 ```
79
80 so the four sites (error-pages, static, docs, future) read uniformly. Missing-source policy: required=true errors; required=false logs warn and skips (so older shas without one of these don't break sando mid-bisect).
81
82 ### `config.rs`
83
84 Replace the `bin_names: Vec<String>` "primary binary" framing with two fields:
85
86 ```rust
87 pub bin_names: Vec<String>, // what cargo produces under target/release/
88 pub release_contents: Vec<ReleaseEntry>, // what gets staged from worktree → release dir
89
90 pub struct ReleaseEntry {
91 pub src: PathBuf, // relative to worktree root
92 pub dst: PathBuf, // relative to release dir
93 pub required: bool,
94 }
95 ```
96
97 Default value for MNW lives in `sando-daemon.toml`, not hard-coded in sando:
98
99 ```toml
100 release_contents = [
101 { src = "server/deploy/error-pages", dst = "error-pages", required = false },
102 { src = "server/static", dst = "static", required = true },
103 { src = "server/target/release/docs", dst = "docs", required = true },
104 ]
105 ```
106
107 This pulls MNW-specific knowledge out of sando code into sando config — closer to what sando wants to be as a generic deploy controller. Tests can override with a fixture release_contents.
108
109 ### `deploy.rs::deploy_node`
110
111 Already rsyncs the whole staged dir, so no changes once `build.rs` stages more into it. Verify rsync flags include `-a` and `--delete` so removed assets don't accumulate across versions (e.g. a deleted static asset).
112
113 ### `bootstrap-node.sh`
114
115 Unit template now writes:
116
117 ```
118 ExecStart=/opt/mnw/current/makenotwork
119 WorkingDirectory=/opt/mnw/current
120 EnvironmentFile=/etc/mnw/makenotwork.env
121 ReadWritePaths=/var/lib/mnw /opt/mnw/yara-rules
122 StateDirectory=mnw/scan-spool
123 ```
124
125 Plus `install -d -o root -g $SERVICE_USER -m 0750 /etc/mnw /var/lib/mnw` upfront. Drop the `EnvironmentFile=-/opt/mnw/.env` default — env file lives at `/etc/mnw/makenotwork.env` now.
126
127 ### Tier rename `mm``host`
128
129 Folded in here while editing topology code. Schema migration in `sando/daemon/migrations/` renames the `tiers` row + any FK references in `tier_state`, `gate_runs`, `deploys`. Plus code sweep: `build.rs`, `routes.rs`, `sando.toml`, test fixtures. Add a defensive assert that loudly fails if `tier='mm'` is still queried anywhere — silent miss would be worse than a panic during the transition.
130
131 ~30 lines of grep+replace + 1 sqlite migration.
132
133 ## C. Test on MM
134
135 1. `cargo test --release --features fast-tests` in `sando/daemon/` — all existing tests pass + any new staging tests added (e.g. `stage_dir copies on success`, `stage_dir errors when required-missing`, `release_contents config parses`).
136 2. Build sandod, install, restart on fw13.
137 3. `POST /rebuild` against current MNW main.
138 4. Inspect `/srv/sando/releases/<v>/` — should contain `makenotwork`, `mnw-admin`, `error-pages/`, `static/`, `docs/`. Verify total size is sane (single-digit MB for binary, tens of MB for static+docs).
139 5. boot_smoke still passes (binary doesn't care what's in the dir alongside it).
140 6. cargo_test still green.
141
142 ## D. Acceptance
143
144 - A `/rebuild` against MNW main produces a staged release with all four asset categories (binaries, error-pages, static, docs).
145 - Existing MM gates stay green.
146 - `/srv/sando/releases/<v>/` is self-contained: deleting `/srv/sando/work/<sha>/` and re-running boot_smoke against the staged binary works. Sanity check that no path in the staged tree reaches back into the worktree.
147
148 ## E. Out of scope for Session 1 (Session 2/3)
149
150 - Any change to testnot or prod
151 - Moving `/opt/git`, `/opt/makenotwork/.env`, state dirs
152 - Authorizing sando's pubkey on prod
153 - Renaming the systemd service path
154 - yara-rules — stays operator-managed; out of sando entirely
155 - Caddy config (operator-managed; uses bundle path indirectly via `localhost:3000` reverse_proxy + per-dir paths that point at `/opt/mnw/current/...` post-migration)
156
157 ## F. Open questions to answer during Session 1
158
159 - **Total bundle size after staging.** If it's >50MB the rsync time per deploy gets noticeable. Worth measuring; not blocking. Sets expectations for Session 2/3.
160 - **DocEngine build path.** Whether `cargo build --release` already produces `target/release/docs/` (if DocEngine is a build-script step) or needs an explicit `cargo run -p docengine -- build` (if it's a separate command). Determines whether `build.rs` in sando does one cargo invocation or two.
161 - **DocEngine output format.** Is `target/release/docs/` a directory tree of HTML/assets ready to serve, or a single bundle file? Affects `stage_dir` semantics.
162
163 ## Sessions 2 and 3 (out of scope but for context)
164
165 **Session 2 — testnot migration (low-risk practice).**
166 - `bootstrap-node.sh` on testnot with the new unit shape.
167 - Write `/etc/mnw/makenotwork.env` on testnot from scratch — this is what was missing during the 2026-06-02 tier-A exercise attempt.
168 - `POST /promote/a` from sando → boots green for the first time.
169 - Exercise §6.5 step 3 tier-A flow we skipped.
170
171 **Session 3 — prod migration (the careful one).**
172 - Inventory + dry-run plan (most of inventory done 2026-06-02; one more pass for exact mv/install sequence).
173 - Lock the deploy.sh path during the migration window.
174 - Stop makenotwork.service.
175 - `bootstrap-node.sh` on prod with `SANDO_PUBKEY=…`, creating `deploy` user + `/opt/mnw/` + new unit.
176 - `mv /opt/makenotwork/.env /etc/mnw/makenotwork.env`.
177 - `mv /opt/makenotwork/{backups,scan-spool}``/var/lib/mnw/` (or rebuild scan-spool — it's transient).
178 - `mv /opt/git /var/lib/mnw/git` + update `GIT_REPOS_PATH` in `/etc/mnw/makenotwork.env`.
179 - Audit all PATH-typed env keys against the new layout: `DOCS_PATH=/opt/mnw/current/docs`, `YARA_RULES_DIR=/opt/mnw/yara-rules`, `ASSUMPTIONS_PATH=/opt/mnw/current/docs/business/assumptions.toml`, etc.
180 - Update build scripts on prod that invoke `mnw-admin` to use `/opt/mnw/current/mnw-admin`.
181 - `POST /promote/b {"hotfix":true}` from sando → first sando deploy to prod.
182 - Start makenotwork.service under the new layout.
183 - Verify makenot.work end-to-end.
184 - Soak for a week.
185 - `rm -rf /opt/makenotwork/` after the soak; archive `deploy.sh` as break-glass-only.
186
187 ## Key paths (for Claude orientation)
188
189 - `MNW/sando/daemon/src/build.rs` — where staging happens (`build_and_run_mm`).
190 - `MNW/sando/daemon/src/config.rs` — Config struct, `bin_names`.
191 - `MNW/sando/daemon/src/deploy.rs``deploy_local`, `deploy_node`.
192 - `MNW/sando/deploy/bootstrap-node.sh` — unit template.
193 - `MNW/sando/sando.toml` — topology config (tier names live here).
194 - `MNW/sando/daemon/sando-daemon.toml` — daemon config (release_contents will go here).
195 - `MNW/sando/daemon/migrations/` — sqlite migrations for the mm→host rename.
196 - `MNW/server/static/`, `MNW/server/site-docs/`, `MNW/server/deploy/error-pages/` — staging sources.
197 - `MNW/shared/docengine/` — DocEngine crate (investigate in §A.1).
198 - `launchplan_final.md` §6.5 — original tier-B decision context this redesign supersedes.
199 - `MNW/sando/plans/config-artifacts.md` — earlier Phase 3 design doc on config vs binary artifacts; complementary background.
200
201 ## G. Outcomes (2026-06-02)
202
203 Session 1 landed in one focused push. All 10 tasks done, 44/44 sando-daemon tests green, pipeline went `host` green end-to-end against sha `f0970b8` (version 0.9.5) on fw13.
204
205 ### What shipped (commit `f0970b8` on sando bare + mnw + srht remotes)
206
207 - `release_contents: Vec<ReleaseEntry>` in `Config`, with `ReleaseEntry { src, dst, required }`. Sando code carries no MNW-specific knowledge; the MNW bundle shape lives in `sando-daemon.toml`.
208 - `build.rs::build_and_run_host` (renamed from `_mm`) iterates `cfg.release_contents`, calling `stage_entry()` per row. `cp -a` semantics; supports the merge-into-existing-dir form so multiple entries can target the same `dst` (used for `docs/` from 3 worktree sources).
209 - `deploy.rs` rsync gained `--delete` (no stale assets across versions) and swapped `--chmod=F0755` for `--chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r,F+X` (binaries 0755, data files 0644).
210 - `bootstrap-node.sh` writes FHS-style unit: `EnvironmentFile=/etc/mnw/makenotwork.env`, `ReadWritePaths=/var/lib/mnw`, `WorkingDirectory=<release>/current`. Pre-creates `/etc/mnw` (root:service 0750) + `/var/lib/mnw` (service:service 0750).
211 - Migration `002_rename_mm_to_host.sql``PRAGMA defer_foreign_keys = ON` + 5 UPDATEs (tiers, nodes, deploys, gate_runs, tier_state). Preserved all existing state on fw13 (host current=0.9.5 + a current=0.8.12 carried through).
212 - Post-receive hook now lives in repo at `sando/deploy/post-receive` and sources `/etc/sando/sando.env``SANDO_DAEMON` resolves to the tailnet listener instead of the 127.0.0.1 default. `bootstrap-sandod-host.sh` installs it.
213
214 ### Open-question answers from §F
215
216 - **Bundle size:** 154 MB total. 133 MB makenotwork + 25 MB mnw-admin + <1 MB of error-pages/static/docs combined. Non-binary content is rounding error against the binaries; rsync over the LAN+tailnet to testnot/prod will be dominated by binary delta (rsync's algorithm helps here — minor code changes ship as small deltas, not 158 MB).
217 - **DocEngine build path:** It's a library crate (`MNW/shared/docengine/`), not a binary. No separate build step. Content stays raw markdown at `server/site-docs/` + `server/docs/business/assumptions.toml`. Three `release_contents` rows merge them into one `docs/` dir.
218 - **DocEngine output format:** N/A — content never compiles. Raw `.md` files; the running server reads them at request time via the env-configured paths.
219
220 ### Surprises / unplanned discoveries
221
222 - **deploy.sh has a CSS minification step** (`npx clean-css-cli`) before rsync. Sando does not. Effect: bundle ships unminified CSS (~3x larger on the wire than deploy.sh-shipped CSS). `server/build.rs` hashes the *unminified* `style.css` for the cache-bust `?v=...`, so correctness is preserved — purely a size issue. Future fix: either eat the size cost (gzip handles most of it), move minification into `server/build.rs`, or add a build-step gate to sando. **Not addressed in Session 1.**
223 - **mnw-admin invocation surface is bigger than expected.** Live call sites on prod: (1) sudoers `/etc/sudoers.d/*` entry `makenotwork ALL=(git) NOPASSWD: /opt/makenotwork/mnw-admin rebuild-keys` — needs path update in Session 3; (2) `command=` prefixes in `/home/git/.ssh/authorized_keys` that `mnw-admin rebuild-keys` itself generates — auto-update on the first post-migration rebuild-keys run. Session 3 sequence: edit the sudoers file first, then run `mnw-admin rebuild-keys` once.
224 - **The defensive assert-on-stray-`"mm"`-lookup proposed in the plan was skipped.** Tests catch it: any unrenamed site fails when the DB no longer has a row matching it. After the rename + sync.rs test run + the production restart on fw13, no "mm" lookups remained.
225
226 ### Carry-over for Session 2
227
228 The Session 2 starting point shifted slightly because we did Session 1's prep + ran it in one push. State to assume when Session 2 begins:
229
230 - `f0970b8` is the active sha on sando's bare repo and is current on tier host. Tier a is on the stale `0.8.12` from pre-Session-1.
231 - testnot still has the unit shape from the *pre-Session-1* `bootstrap-node.sh`. It is in a crashloop (MissingDatabaseUrl, no env file). Session 2 reprovisioning will replace its systemd unit with the new FHS shape AND populate `/etc/mnw/makenotwork.env` from scratch.
232 - `bootstrap-sandod-host.sh` on fw13 is the new version; re-running it is idempotent.
233 - Sando's pubkey on testnot under the `deploy` user: confirmed working earlier (`sudo -u sando ssh deploy@testnot` returned). No re-auth needed.
234 - The bundle has not yet been deployed remotely. Tier a's `0.8.12` deploy predates `release_contents`; the testnot release dir contains only binaries. First Session 2 promotion to tier a will be the first remote deploy of the full bundle.
235
236 ### Things Session 2 should re-check before promoting
237
238 - Verify `/etc/mnw/` and `/var/lib/mnw/` get created with the right ownership when `bootstrap-node.sh` runs on testnot (the new code path; never tested on a real box).
239 - Decide what testnot's env file looks like — full prod-clone (with real Stripe test keys) or minimal (just what `Config::from_env` requires to boot). Minimal is faster and validates the deploy path; prod-clone exercises more code paths.
240 - Confirm sando's pubkey is in `/home/deploy/.ssh/authorized_keys` on testnot, not just routable via Tailscale SSH. (Tailscale SSH ≠ pubkey auth; sando uses pubkey-only via OpenSSH.)
241
242