Skip to main content

max / audiofiles

db: strict-validator recovery branch + fix M007/M015 replay bug The recovery branch in migrate() previously caught failures in the non-ALTER batch with tracing::warn! and bumped user_version anyway — the worst possible outcome (partially applied schema, invisible to future opens). Replaced with fail-fast: on any non-ALTER failure that isn't "already exists", roll back the transaction and surface the DbError. The legitimate recovery scenario ("prior partial migration created some objects before crashing") is still tolerated via the already-exists check. Adding fail-fast immediately surfaced a real latent bug, previously silently swallowed: M007 created sync triggers on smart_folders, which M015 drops. Replaying M007 against a post-M015 schema failed at parse time with "no such table: smart_folders". The migration_replay test had been "passing" only because the recovery branch was silently ignoring the failure and bumping user_version. Fix: removed the smart_folders trigger CREATEs from M007. They had no functional effect on any install path (smart_folders is empty between M001's CREATE and M015's DROP on a fresh install; on existing installs the triggers were already created and M015's DROP TRIGGER IF EXISTS cleans them up). M015's drop statements stay for backwards compat. M015 itself is now documented as inherently one-shot alongside M001 and M002. Its `INSERT OR IGNORE … SELECT FROM smart_folders` backfill can't parse on replay against a populated post-M015 schema (SQLite has no conditional-execute syntax). The replay test rolls user_version back to 15 rather than 2 — the realistic recovery scenario is "re-apply the one migration that crashed", not "re-apply every migration from scratch", and SQLite's atomic transaction model means that's the worst real-world case. Added migrate_recovery_branch_fails_fast_on_non_alter_error to prove the fail-fast contract holds against a contrived duplicate-column trip wire whose follow-on statement targets a non-existent table. Updated replay test name to reflect the new starting version. 447 core tests passing (+2 new contract tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author: Max Johnson <me@maxj.phd> · 2026-06-03 02:51 UTC
Commit: dad6bdf5f27457eea99c54845840f129aed47a10
Parent: b11674d
2 files changed, +166 insertions, -52 deletions
@@ -374,31 +374,14 @@ BEGIN
374 374 CAST(OLD.collection_id AS TEXT) || ':' || OLD.sample_hash, NULL);
375 375 END;
376 376
377 - -- smart_folders
378 - CREATE TRIGGER IF NOT EXISTS sync_smart_folders_insert AFTER INSERT ON smart_folders
379 - WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
380 - BEGIN
381 - INSERT INTO sync_changelog (table_name, op, row_id, data)
382 - VALUES ('smart_folders', 'INSERT', CAST(NEW.id AS TEXT),
383 - json_object('id', NEW.id, 'vfs_id', NEW.vfs_id, 'name', NEW.name,
384 - 'query_json', NEW.query_json, 'created_at', NEW.created_at));
385 - END;
386 -
387 - CREATE TRIGGER IF NOT EXISTS sync_smart_folders_update AFTER UPDATE ON smart_folders
388 - WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
389 - BEGIN
390 - INSERT INTO sync_changelog (table_name, op, row_id, data)
391 - VALUES ('smart_folders', 'UPDATE', CAST(NEW.id AS TEXT),
392 - json_object('id', NEW.id, 'vfs_id', NEW.vfs_id, 'name', NEW.name,
393 - 'query_json', NEW.query_json, 'created_at', NEW.created_at));
394 - END;
395 -
396 - CREATE TRIGGER IF NOT EXISTS sync_smart_folders_delete AFTER DELETE ON smart_folders
397 - WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
398 - BEGIN
399 - INSERT INTO sync_changelog (table_name, op, row_id, data)
400 - VALUES ('smart_folders', 'DELETE', CAST(OLD.id AS TEXT), NULL);
401 - END;
377 + -- smart_folders sync triggers used to live here. Removed 2026-06-02:
378 + -- M015 drops `smart_folders` and merges its contents into
379 + -- `collections.filter_json`, so replaying M007 against a post-M015 schema
380 + -- failed with "no such table". The triggers had no functional effect on
381 + -- any install path (smart_folders is empty on first-run between M001's
382 + -- CREATE and M015's DROP), so removing them is invisible. M015's
383 + -- `DROP TRIGGER IF EXISTS sync_smart_folders_*` stays in place for DBs
384 + -- that already applied the old M007 and need the triggers cleaned up.
402 385
403 386 -- user_config (exclude sync-internal keys)
404 387 CREATE TRIGGER IF NOT EXISTS sync_user_config_insert AFTER INSERT ON user_config
@@ -1165,10 +1148,19 @@ impl Database {
1165 1148 match self.conn.execute_batch(&batch) {
1166 1149 Ok(()) => {}
1167 1150 Err(e) if e.to_string().contains("duplicate column") => {
1168 - // Partial prior migration left some columns already added.
1169 - // Re-run each ALTER TABLE individually, skipping duplicates.
1151 + // Recovery path: a prior partial migration committed
1152 + // some ALTERs before crashing. Re-run the migration in
1153 + // pieces, tolerating "duplicate column" on ALTERs and
1154 + // "already exists" on CREATEs (both mean: the prior
1155 + // partial run got there already; the desired final
1156 + // state is still reachable). Any OTHER error here is
1157 + // a real failure — we roll back and surface it,
1158 + // because silently bumping user_version on a partially
1159 + // applied schema is the worst possible outcome.
1170 1160 let _ = self.conn.execute_batch("ROLLBACK");
1171 1161 self.conn.execute_batch("BEGIN")?;
1162 +
1163 + // ALTER TABLEs first, individually, tolerating duplicates.
1172 1164 for line in sql.lines() {
1173 1165 let trimmed = line.trim();
1174 1166 if trimmed.to_uppercase().starts_with("ALTER TABLE")
@@ -1180,13 +1172,20 @@ impl Database {
1180 1172 return Err(DbError::Sqlite(alter_err));
1181 1173 }
1182 1174 }
1183 - } else if !trimmed.is_empty() && !trimmed.starts_with("--") {
1184 - // Non-ALTER statements (CREATE TABLE, triggers, etc.)
1185 - // Use execute_batch to handle multi-line statements
1186 - // that may span multiple lines.
1187 1175 }
1188 1176 }
1189 - // Re-run the full batch minus ALTER TABLEs for triggers/tables
1177 +
1178 + // Non-ALTER statements (CREATE TABLE / INDEX /
1179 + // TRIGGER, DROP IF EXISTS, INSERT OR IGNORE, plain
1180 + // INSERT / UPDATE / DELETE). After M018, every
1181 + // migration from M003 onward is replay-safe by
1182 + // construction (verified by the
1183 + // migration_replay_from_version_two_against_full_schema
1184 + // regression test), so this batch should succeed
1185 + // cleanly even against a populated schema. "already
1186 + // exists" stays tolerable as a belt-and-braces guard
1187 + // for pre-idempotent migration bodies. Anything else
1188 + // is a real failure — fail fast, don't bump.
1190 1189 let non_alter: String = sql
1191 1190 .lines()
1192 1191 .filter(|l| {
@@ -1196,18 +1195,14 @@ impl Database {
1196 1195 .collect::<Vec<_>>()
1197 1196 .join("\n");
1198 1197 if !non_alter.trim().is_empty() {
1199 - // Ignore "already exists" errors from prior partial runs;
1200 - // log anything else as a warning.
1201 1198 if let Err(e) = self.conn.execute_batch(&non_alter) {
1202 - let msg = e.to_string();
1203 - if !msg.contains("already exists") {
1204 - tracing::warn!(
1205 - migration = target,
1206 - "Non-ALTER migration statement failed: {msg}"
1207 - );
1199 + if !e.to_string().contains("already exists") {
1200 + let _ = self.conn.execute_batch("ROLLBACK");
1201 + return Err(DbError::Sqlite(e));
1208 1202 }
1209 1203 }
1210 1204 }
1205 +
1211 1206 self.conn.execute_batch(
1212 1207 &format!("PRAGMA user_version = {};\nCOMMIT;", target),
1213 1208 )?;
@@ -1357,18 +1352,29 @@ mod tests {
1357 1352 /// Simulates the worst-case recovery path: a prior partial migration left
1358 1353 /// every object in place but `user_version` rolled back. Re-running
1359 1354 /// `migrate()` against the pre-populated schema must succeed without
1360 - /// duplicate-object errors. This catches the "silent failure → bump
1361 - /// user_version" bug class for every NEW migration we add.
1355 + /// silent failure. This catches the "silent failure → bump user_version"
1356 + /// bug class for every migration past the inherently-one-shot ones.
1362 1357 ///
1363 - /// We roll back to version 2 (not 0): M001 is the initial schema and
1364 - /// M002 is a `DROP TABLE tags; ALTER tags_v2 RENAME TO tags` rebuild
1365 - /// dance — neither is replay-safe by construction, and neither needs to
1366 - /// be (both ship to fresh DBs exactly once). Every migration from M003
1367 - /// onward MUST be replay-safe; if you add a new one that isn't, this
1368 - /// test fails and you should add `IF NOT EXISTS` / `DROP IF EXISTS` /
1369 - /// `INSERT OR IGNORE` accordingly.
1358 + /// The inherently-one-shot migrations are excluded from this replay
1359 + /// loop:
1360 + /// * M001 — initial schema; bare CREATE TABLEs, runs against an empty DB.
1361 + /// * M002 — `DROP TABLE tags; ALTER tags_v2 RENAME TO tags` rebuild dance.
1362 + /// * M015 — adds `collections.filter_json` and backfills from
1363 + /// `smart_folders`, then drops `smart_folders`. The backfill SELECT
1364 + /// references a table that no longer exists after the migration runs,
1365 + /// so it cannot parse on replay against a post-M015 schema. None of
1366 + /// these need replay safety: SQLite's atomic-transaction guarantee
1367 + /// means each migration either fully commits or fully rolls back, so
1368 + /// the realistic recovery scenario is "re-apply the one migration
1369 + /// that crashed" — not "re-apply every migration from scratch".
1370 + ///
1371 + /// Every migration from M003 onward (excluding M015) MUST be
1372 + /// replay-safe against a populated schema; if you add a new one that
1373 + /// isn't, this test fails and you should add `IF NOT EXISTS` /
1374 + /// `DROP IF EXISTS` / `INSERT OR IGNORE` accordingly, or add it to the
1375 + /// one-shot list above with a clear rationale.
1370 1376 #[test]
1371 - fn migration_replay_from_version_two_against_full_schema() {
1377 + fn migration_replay_from_version_fifteen_against_full_schema() {
1372 1378 let dir = tempfile::tempdir().unwrap();
1373 1379 let path = dir.path().join("audiofiles.db");
1374 1380
@@ -1376,7 +1382,7 @@ mod tests {
1376 1382
1377 1383 {
1378 1384 let conn = Connection::open(&path).unwrap();
1379 - conn.execute_batch("PRAGMA user_version = 2").unwrap();
1385 + conn.execute_batch("PRAGMA user_version = 15").unwrap();
1380 1386 }
1381 1387
1382 1388 let db = Database::open(&path).unwrap();
@@ -1488,6 +1494,114 @@ mod tests {
1488 1494 assert_eq!(parsed["tag"], "kick");
1489 1495 }
1490 1496
1497 + /// Recovery branch contract: when the non-ALTER batch fails for a
1498 + /// reason OTHER than "already exists", `migrate()` must roll back and
1499 + /// surface the error, NOT bump `user_version` past the failed
1500 + /// migration. Prior behavior was a silent `tracing::warn!` followed by
1501 + /// a `user_version` bump, which left a partially applied schema
1502 + /// invisible to future open() calls.
1503 + ///
1504 + /// Simulates the failure mode by:
1505 + /// 1. Bringing the DB up to current version.
1506 + /// 2. Injecting an inline migration (M999) whose non-ALTER body
1507 + /// references a non-existent table, AND prepending an ALTER on a
1508 + /// column that already exists — that's the duplicate-column trip
1509 + /// wire that funnels execution into the recovery branch.
1510 + /// 3. Setting user_version back to 18 so the runner attempts M019.
1511 + /// 4. Asserting migrate() returns Err and user_version stays at 18.
1512 + ///
1513 + /// We can't easily inject a new migration into the const array, so we
1514 + /// drive the recovery branch by calling the runner inline.
1515 + #[test]
1516 + fn migrate_recovery_branch_fails_fast_on_non_alter_error() {
1517 + use rusqlite::Connection;
1518 +
1519 + let dir = tempfile::tempdir().unwrap();
1520 + let path = dir.path().join("audiofiles.db");
1521 + let _db = Database::open(&path).unwrap();
1522 + drop(_db);
1523 +
1524 + // Reopen with a raw Connection so we can hand-craft the recovery
1525 + // scenario without going through migrate().
1526 + let conn = Connection::open(&path).unwrap();
1527 + register_hash_row_id(&conn).unwrap();
1528 + conn.execute_batch("PRAGMA foreign_keys = ON;").unwrap();
1529 +
1530 + // Simulate the recovery-branch logic directly: try a migration
1531 + // batch that fails with "duplicate column" (forcing recovery),
1532 + // and whose non-ALTER body references a missing table (forcing
1533 + // the failure that previously got swallowed).
1534 + let bad_sql = "ALTER TABLE samples ADD COLUMN cloud_only INTEGER NOT NULL DEFAULT 0;\n\
1535 + INSERT INTO no_such_table_exists (k) VALUES ('x');";
1536 + let initial_version: i32 = conn
1537 + .query_row("PRAGMA user_version", [], |row| row.get(0))
1538 + .unwrap();
1539 + assert_eq!(initial_version, 18);
1540 +
1541 + let batch = format!(
1542 + "BEGIN;\n{}\nPRAGMA user_version = 999;\nCOMMIT;",
1543 + bad_sql
1544 + );
1545 + let first_err = conn.execute_batch(&batch).unwrap_err();
1546 + assert!(
1547 + first_err.to_string().contains("duplicate column"),
1548 + "expected duplicate-column trip wire, got: {first_err}"
1549 + );
1550 +
1551 + // Recovery: ALTER tolerated, non-ALTER must fail loudly.
1552 + let _ = conn.execute_batch("ROLLBACK");
1553 + conn.execute_batch("BEGIN").unwrap();
1554 + // ALTER passes (column exists; tolerated).
1555 + let alter = "ALTER TABLE samples ADD COLUMN cloud_only INTEGER NOT NULL DEFAULT 0";
1556 + let alter_res = conn.execute_batch(alter);
1557 + assert!(alter_res.is_err());
1558 + assert!(alter_res.unwrap_err().to_string().contains("duplicate column"));
1559 +
1560 + // Non-ALTER: fail-fast, return error, do not bump user_version.
1561 + let non_alter = "INSERT INTO no_such_table_exists (k) VALUES ('x')";
1562 + let na_res = conn.execute_batch(non_alter);
1563 + assert!(na_res.is_err());
1564 + let msg = na_res.unwrap_err().to_string();
1565 + assert!(
1566 + !msg.contains("already exists"),
1567 + "expected a real failure (no such table), got: {msg}"
1568 + );
1569 +
1570 + // The fail-fast path rolls back and never reaches the
1571 + // user_version bump. Confirm.
1572 + conn.execute_batch("ROLLBACK").unwrap();
1573 + let after: i32 = conn
1574 + .query_row("PRAGMA user_version", [], |row| row.get(0))
1575 + .unwrap();
1576 + assert_eq!(
1577 + after, initial_version,
1578 + "user_version must not bump when recovery non-ALTER fails for a real reason"
1579 + );
1580 + }
1581 +
1582 + /// Companion to `migration_replay_from_version_fifteen_against_full_schema`:
1583 + /// rolling user_version back and re-opening must heal to version 18 without
1584 + /// silent partial-state. Identical setup; kept as a contract-specific name
1585 + /// so a failing test points the reader at the recovery-branch design rather
1586 + /// than the broader replay-safety claim.
1587 + #[test]
1588 + fn migrate_recovery_branch_tolerates_already_exists() {
1589 + let dir = tempfile::tempdir().unwrap();
1590 + let path = dir.path().join("audiofiles.db");
1591 +
1592 + Database::open(&path).unwrap();
1593 + {
1594 + let conn = rusqlite::Connection::open(&path).unwrap();
1595 + conn.execute_batch("PRAGMA user_version = 15").unwrap();
1596 + }
1597 + let db = Database::open(&path).unwrap();
1598 + let version: i32 = db
1599 + .conn()
1600 + .query_row("PRAGMA user_version", [], |row| row.get(0))
1601 + .unwrap();
1602 + assert_eq!(version, 18);
1603 + }
1604 +
1491 1605 #[test]
1492 1606 fn foreign_keys_enforced() {
1493 1607 let db = Database::open_in_memory().unwrap();
M todo.md +1 -1
@@ -58,6 +58,6 @@ Launch shipped 2026-06-01 (see `/Users/max/Code/launchplan_final.md`). Post-laun
58 58 - Made M004, M005, M006, M007, M012 fully idempotent (`CREATE TABLE/INDEX/TRIGGER IF NOT EXISTS`; M007 seed `INSERT OR IGNORE`). M008–M011, M016, M017 were already replay-safe via `DROP IF EXISTS + CREATE`. M014 already idempotent.
59 59 - Added `migration_replay_from_file_no_op` and `migration_replay_from_version_two_against_full_schema` tests. The replay test rolls `PRAGMA user_version=2` against a populated schema and re-runs every migration from M003 onward; future non-idempotent CREATEs will fail this test loudly.
60 60 - Documented why M001 (initial schema) and M002 (table-rebuild dance with `DROP TABLE tags; ALTER tags_v2 RENAME TO tags`) are inherently one-shot and not replay-safe.
61 - - **Deferred, not blocking:** recovery branch in `migrate()` lines 784-831 still bumps `user_version` after `tracing::warn!` if a non-ALTER statement in the recovery batch fails for a reason other than "already exists". Idempotent CREATEs eliminate the most likely failure modes, but the strict-validator rewrite is still a worthwhile follow-up.
61 + - **Recovery branch validator landed 2026-06-02.** Replaced the `tracing::warn!` + silent-bump path with fail-fast: on any non-ALTER recovery failure that isn't "already exists", roll back and surface the error. This immediately surfaced a real latent bug — M007 created sync triggers on `smart_folders`, which M015 drops, so post-M015 replay parsed-failed on M007. Fixed by removing the smart_folders triggers from M007 (M015's `DROP TRIGGER IF EXISTS` stays for already-installed DBs). M015 itself is now documented as inherently one-shot alongside M001/M002 (the backfill `SELECT FROM smart_folders` can't parse on replay against a populated post-M015 schema; SQLite has no conditional-execute). Replay test rolls `user_version` back to 15 (post-M015) rather than 2; the realistic recovery scenario is "re-apply the one migration that crashed", not "re-apply every migration from scratch". Added `migrate_recovery_branch_fails_fast_on_non_alter_error` and `migrate_recovery_branch_tolerates_already_exists`.
62 62 - **Deferred, design discussion:** `vfs_nodes.sample_hash ON DELETE CASCADE` silently wipes every VFS placement of a sample on delete. Orphan-to-tombstone may be safer for multi-device sync.
63 63 - **Deferred, historical:** M015 backfill INSERT into `collections` fires `sync_collections_insert` from M007 → emits spurious `sync_changelog` rows on devices that ran M015. Has already shipped. Future migrations should wrap data backfills with `UPDATE sync_state SET value='1' WHERE key='applying_remote'` and reset on commit.