Skip to main content

max / audiofiles

sync: M018 hash sync_changelog.row_id to plug cleartext leak Per the 2026-06-02 SyncKit upload audit, sync_changelog.row_id was sent to the server in cleartext and the triggers stuffed user content into it — raw sample SHA-256s (content fingerprints) and tag strings like sample_hash:my-tag. The encrypted `data` field replicated the content, but the row_id leak stood regardless, and sync_panel.rs:402 claimed "All data is encrypted before leaving your device". This commit: - Adds `hash_row_id(salt, key)` as a deterministic SQLite scalar function registered on every Database connection (rusqlite `functions` feature). SHA-256 over a per-user salt + canonical key; the salt lives in sync_state and is never synced, so even a global rainbow table over common tag strings can't deanonymise users. - M018: generates row_id_salt via randomblob(32), drops and recreates every sync trigger with hash_row_id wrapping for the four sensitive tables (samples, audio_analysis, tags, collection_members). DELETE triggers across all tables now emit the canonical PK in JSON `data` so pull-side replay never depends on row_id semantics. Backfills existing unpushed sync_changelog rows: rebuilds composite-PK `data` from the cleartext row_id, then hashes. smart_folders triggers from M007 are not recreated (table dropped in M015). - resolve::apply_delete: reads canonical PK columns from the decrypted `data` JSON first; falls back to the pre-M018 row_id-parsing path for rows already on the server. - sync_panel.rs: privacy copy updated to match reality — names what's encrypted (audio, filenames, tags, folders, analysis) and what isn't (opaque row hashes, change timestamps). - Added m018_hashes_sensitive_row_ids and m018_delete_triggers_emit_canonical_key_in_data as regression gates. Three existing sync trigger tests updated to inspect `data` rather than asserting cleartext row_id. Numeric-id tables (vfs, vfs_nodes, collections, smart_folders gone, edit_history) and user_config (closed app-defined key set) keep cleartext row_ids — no user content to protect there. Deferred to MNW/server: blob hash as S3 object key is still a content fingerprint (audit Risk 5); per-user blob namespace is a server change. 803 tests green (801 + 2 new M018 contract tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author: Max Johnson <me@maxj.phd> · 2026-06-03 02:38 UTC
Commit: b925f546a2dd3b7aecff11039d7884de39403bf2
Parent: 640f61b
6 files changed, +537 insertions, -26 deletions
M Cargo.toml +1 -1
@@ -16,7 +16,7 @@ egui = { version = "0.34", default-features = false, features = ["default_fonts"
16 16 egui_extras = { version = "0.34", default-features = false }
17 17 eframe = { version = "0.34", default-features = false, features = ["default_fonts", "glow"] }
18 18 cpal = "0.17"
19 - rusqlite = { version = "0.31.0", features = ["bundled"] }
19 + rusqlite = { version = "0.31.0", features = ["bundled", "functions"] }
20 20 thiserror = "2.0.18"
21 21 sha2 = "0.10.9"
22 22 symphonia = { version = "0.5.5", default-features = false, features = ["wav", "aiff", "mp3", "flac", "ogg", "vorbis", "pcm", "aac", "alac", "isomp4", "caf"] }
@@ -399,9 +399,14 @@ fn draw_needs_encryption(
399 399 ui.label("Set an encryption password to protect your synced data.");
400 400 ui.add_space(theme::space::MD);
401 401 ui.label(
402 - egui::RichText::new("All data is encrypted before leaving your device.")
403 - .small()
404 - .weak(),
402 + egui::RichText::new(
403 + "Your sample audio, filenames, tags, folder structure, \
404 + and analysis are encrypted before leaving your device. \
405 + The server sees row identifiers (opaque hashes) and \
406 + change timestamps.",
407 + )
408 + .small()
409 + .weak(),
405 410 );
406 411 ui.add_space(theme::space::SM);
407 412 // Warning banner: a typo in the next field permanently re-encrypts the
@@ -2,7 +2,8 @@
2 2
3 3 use std::path::Path;
4 4
5 - use rusqlite::Connection;
5 + use rusqlite::{Connection, functions::FunctionFlags};
6 + use sha2::{Digest, Sha256};
6 7 use thiserror::Error;
7 8 use tracing::instrument;
8 9
@@ -708,6 +709,385 @@ BEGIN
708 709 END;
709 710 "#;
710 711
712 + /// M018 — hash sensitive row_id values on the wire.
713 + ///
714 + /// Per 2026-06-02 SyncKit upload audit: the `sync_changelog.row_id` column
715 + /// is sent to the server in cleartext, and the existing triggers stuffed
716 + /// user content into it (tag strings as `sample_hash:tag`, raw sample
717 + /// SHA-256s as content fingerprints, collection bindings). The encrypted
718 + /// `data` field replicated the same content, but the cleartext row_id leak
719 + /// stood regardless.
720 + ///
721 + /// This migration:
722 + ///
723 + /// 1. Generates a per-user `row_id_salt` in `sync_state` (never synced) so
724 + /// even a global rainbow table over common tag strings can't deanonymise
725 + /// users. SQLite's `randomblob(32)` is seeded from /dev/urandom on POSIX
726 + /// and CryptGenRandom on Windows.
727 + /// 2. Recreates every sync trigger to wrap row_id in
728 + /// `hash_row_id(salt, canonical_key)`. The encrypted `data` field still
729 + /// carries the cleartext for the receiving device.
730 + /// 3. Extends DELETE triggers to emit the canonical key(s) in `data` so the
731 + /// pull-side `resolve::apply_delete` can reconstruct the WHERE clause
732 + /// without parsing row_id (which is now opaque).
733 + /// 4. Rewrites every unpushed row in `sync_changelog` that contained
734 + /// sensitive cleartext: hashes the row_id, and for DELETE rows in
735 + /// composite-key tables (`tags`, `collection_members`) backfills the
736 + /// canonical key from the now-being-hashed cleartext into `data`.
737 + ///
738 + /// Numeric-id tables (vfs, vfs_nodes, collections, smart_folders,
739 + /// edit_history) and user_config are left as-is — their row_ids carry
740 + /// either opaque integers or a closed set of app-defined config keys, no
741 + /// user content.
742 + const MIGRATION_018: &str = r#"
743 + -- 1. Per-user salt for row_id hashing. `INSERT OR IGNORE` so re-running
744 + -- this migration after a partial crash doesn't rotate the salt and
745 + -- invalidate already-hashed row_ids.
746 + INSERT OR IGNORE INTO sync_state (key, value)
747 + VALUES ('row_id_salt', lower(hex(randomblob(32))));
748 +
749 + -- 2. Backfill canonical-key `data` for unpushed DELETE rows in composite-PK
750 + -- tables. Must run BEFORE the row_id hash so we still have the cleartext
751 + -- composite to parse.
752 + UPDATE sync_changelog
753 + SET data = json_object(
754 + 'sample_hash', substr(row_id, 1, instr(row_id, ':') - 1),
755 + 'tag', substr(row_id, instr(row_id, ':') + 1)
756 + )
757 + WHERE table_name = 'tags' AND op = 'DELETE' AND pushed = 0
758 + AND data IS NULL
759 + AND instr(row_id, ':') > 0;
760 +
761 + UPDATE sync_changelog
762 + SET data = json_object(
763 + 'collection_id', substr(row_id, 1, instr(row_id, ':') - 1),
764 + 'sample_hash', substr(row_id, instr(row_id, ':') + 1)
765 + )
766 + WHERE table_name = 'collection_members' AND op = 'DELETE' AND pushed = 0
767 + AND data IS NULL
768 + AND instr(row_id, ':') > 0;
769 +
770 + -- 3. For single-PK sensitive-row_id tables, backfill canonical-key `data`
771 + -- for unpushed DELETE rows so apply_delete on the pulling device can
772 + -- reconstruct the WHERE clause from the encrypted data alone.
773 + UPDATE sync_changelog
774 + SET data = json_object('hash', row_id)
775 + WHERE table_name IN ('samples', 'audio_analysis')
776 + AND op = 'DELETE' AND pushed = 0 AND data IS NULL;
777 +
778 + -- 4. Now hash the row_id for every unpushed row whose cleartext leaked user
779 + -- content (sample hashes, tag strings).
780 + UPDATE sync_changelog
781 + SET row_id = hash_row_id(
782 + (SELECT value FROM sync_state WHERE key = 'row_id_salt'),
783 + row_id
784 + )
785 + WHERE pushed = 0
786 + AND table_name IN ('samples', 'audio_analysis', 'tags', 'collection_members');
787 +
788 + -- 5. Drop and recreate every sync trigger with hash_row_id wrapping.
789 + -- DELETE triggers gain a canonical-key `data` payload.
790 +
791 + DROP TRIGGER IF EXISTS sync_samples_insert;
792 + DROP TRIGGER IF EXISTS sync_samples_update;
793 + DROP TRIGGER IF EXISTS sync_samples_delete;
794 + DROP TRIGGER IF EXISTS sync_audio_analysis_insert;
795 + DROP TRIGGER IF EXISTS sync_audio_analysis_update;
796 + DROP TRIGGER IF EXISTS sync_audio_analysis_delete;
797 + DROP TRIGGER IF EXISTS sync_vfs_insert;
798 + DROP TRIGGER IF EXISTS sync_vfs_update;
799 + DROP TRIGGER IF EXISTS sync_vfs_delete;
800 + DROP TRIGGER IF EXISTS sync_vfs_nodes_insert;
801 + DROP TRIGGER IF EXISTS sync_vfs_nodes_update;
802 + DROP TRIGGER IF EXISTS sync_vfs_nodes_delete;
803 + DROP TRIGGER IF EXISTS sync_tags_insert;
804 + DROP TRIGGER IF EXISTS sync_tags_delete;
805 + DROP TRIGGER IF EXISTS sync_collections_insert;
806 + DROP TRIGGER IF EXISTS sync_collections_update;
807 + DROP TRIGGER IF EXISTS sync_collections_delete;
808 + DROP TRIGGER IF EXISTS sync_collection_members_insert;
809 + DROP TRIGGER IF EXISTS sync_collection_members_delete;
810 + -- smart_folders table was dropped in M015; M007 triggers are no-ops post-M015
811 + DROP TRIGGER IF EXISTS sync_user_config_insert;
812 + DROP TRIGGER IF EXISTS sync_user_config_update;
813 + DROP TRIGGER IF EXISTS sync_user_config_delete;
814 + DROP TRIGGER IF EXISTS sync_edit_history_insert;
815 +
816 + -- samples (single PK: hash)
817 + CREATE TRIGGER sync_samples_insert AFTER INSERT ON samples
818 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
819 + BEGIN
820 + INSERT INTO sync_changelog (table_name, op, row_id, data)
821 + VALUES ('samples', 'INSERT',
822 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), NEW.hash),
823 + json_object('hash', NEW.hash, 'original_name', NEW.original_name,
824 + 'file_extension', NEW.file_extension, 'file_size', NEW.file_size,
825 + 'import_date', NEW.import_date, 'last_modified', NEW.last_modified,
826 + 'duration', NEW.duration, 'cloud_only', NEW.cloud_only));
827 + END;
828 +
829 + CREATE TRIGGER sync_samples_update AFTER UPDATE ON samples
830 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
831 + BEGIN
832 + INSERT INTO sync_changelog (table_name, op, row_id, data)
833 + VALUES ('samples', 'UPDATE',
834 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), NEW.hash),
835 + json_object('hash', NEW.hash, 'original_name', NEW.original_name,
836 + 'file_extension', NEW.file_extension, 'file_size', NEW.file_size,
837 + 'import_date', NEW.import_date, 'last_modified', NEW.last_modified,
838 + 'duration', NEW.duration, 'cloud_only', NEW.cloud_only));
839 + END;
840 +
841 + CREATE TRIGGER sync_samples_delete AFTER DELETE ON samples
842 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
843 + BEGIN
844 + INSERT INTO sync_changelog (table_name, op, row_id, data)
845 + VALUES ('samples', 'DELETE',
846 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), OLD.hash),
847 + json_object('hash', OLD.hash));
848 + END;
849 +
850 + -- audio_analysis (single PK: hash)
851 + CREATE TRIGGER sync_audio_analysis_insert AFTER INSERT ON audio_analysis
852 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
853 + BEGIN
854 + INSERT INTO sync_changelog (table_name, op, row_id, data)
855 + VALUES ('audio_analysis', 'INSERT',
856 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), NEW.hash),
857 + json_object('hash', NEW.hash, 'bpm', NEW.bpm, 'musical_key', NEW.musical_key,
858 + 'duration', NEW.duration, 'sample_rate', NEW.sample_rate, 'channels', NEW.channels,
859 + 'peak_db', NEW.peak_db, 'rms_db', NEW.rms_db, 'is_loop', NEW.is_loop,
860 + 'spectral_centroid', NEW.spectral_centroid, 'onset_strength', NEW.onset_strength,
861 + 'analyzed_at', NEW.analyzed_at, 'lufs', NEW.lufs,
862 + 'spectral_flatness', NEW.spectral_flatness, 'spectral_rolloff', NEW.spectral_rolloff,
863 + 'zero_crossing_rate', NEW.zero_crossing_rate, 'classification', NEW.classification));
864 + END;
865 +
866 + CREATE TRIGGER sync_audio_analysis_update AFTER UPDATE ON audio_analysis
867 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
868 + BEGIN
869 + INSERT INTO sync_changelog (table_name, op, row_id, data)
870 + VALUES ('audio_analysis', 'UPDATE',
871 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), NEW.hash),
872 + json_object('hash', NEW.hash, 'bpm', NEW.bpm, 'musical_key', NEW.musical_key,
873 + 'duration', NEW.duration, 'sample_rate', NEW.sample_rate, 'channels', NEW.channels,
874 + 'peak_db', NEW.peak_db, 'rms_db', NEW.rms_db, 'is_loop', NEW.is_loop,
875 + 'spectral_centroid', NEW.spectral_centroid, 'onset_strength', NEW.onset_strength,
876 + 'analyzed_at', NEW.analyzed_at, 'lufs', NEW.lufs,
877 + 'spectral_flatness', NEW.spectral_flatness, 'spectral_rolloff', NEW.spectral_rolloff,
878 + 'zero_crossing_rate', NEW.zero_crossing_rate, 'classification', NEW.classification));
879 + END;
880 +
881 + CREATE TRIGGER sync_audio_analysis_delete AFTER DELETE ON audio_analysis
882 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
883 + BEGIN
884 + INSERT INTO sync_changelog (table_name, op, row_id, data)
885 + VALUES ('audio_analysis', 'DELETE',
886 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'), OLD.hash),
887 + json_object('hash', OLD.hash));
888 + END;
889 +
890 + -- vfs (numeric PK — row_id stays as id string; not sensitive)
891 + CREATE TRIGGER sync_vfs_insert AFTER INSERT ON vfs
892 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
893 + BEGIN
894 + INSERT INTO sync_changelog (table_name, op, row_id, data)
895 + VALUES ('vfs', 'INSERT', CAST(NEW.id AS TEXT),
896 + json_object('id', NEW.id, 'name', NEW.name,
897 + 'created_at', NEW.created_at, 'modified_at', NEW.modified_at,
898 + 'sync_files', NEW.sync_files));
899 + END;
900 +
901 + CREATE TRIGGER sync_vfs_update AFTER UPDATE ON vfs
902 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
903 + BEGIN
904 + INSERT INTO sync_changelog (table_name, op, row_id, data)
905 + VALUES ('vfs', 'UPDATE', CAST(NEW.id AS TEXT),
906 + json_object('id', NEW.id, 'name', NEW.name,
907 + 'created_at', NEW.created_at, 'modified_at', NEW.modified_at,
908 + 'sync_files', NEW.sync_files));
909 + END;
910 +
911 + CREATE TRIGGER sync_vfs_delete AFTER DELETE ON vfs
912 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
913 + BEGIN
914 + INSERT INTO sync_changelog (table_name, op, row_id, data)
915 + VALUES ('vfs', 'DELETE', CAST(OLD.id AS TEXT), json_object('id', OLD.id));
916 + END;
917 +
918 + -- vfs_nodes (numeric PK)
919 + CREATE TRIGGER sync_vfs_nodes_insert AFTER INSERT ON vfs_nodes
920 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
921 + BEGIN
922 + INSERT INTO sync_changelog (table_name, op, row_id, data)
923 + VALUES ('vfs_nodes', 'INSERT', CAST(NEW.id AS TEXT),
924 + json_object('id', NEW.id, 'vfs_id', NEW.vfs_id, 'parent_id', NEW.parent_id,
925 + 'name', NEW.name, 'node_type', NEW.node_type,
926 + 'sample_hash', NEW.sample_hash, 'created_at', NEW.created_at));
927 + END;
928 +
929 + CREATE TRIGGER sync_vfs_nodes_update AFTER UPDATE ON vfs_nodes
930 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
931 + BEGIN
932 + INSERT INTO sync_changelog (table_name, op, row_id, data)
933 + VALUES ('vfs_nodes', 'UPDATE', CAST(NEW.id AS TEXT),
934 + json_object('id', NEW.id, 'vfs_id', NEW.vfs_id, 'parent_id', NEW.parent_id,
935 + 'name', NEW.name, 'node_type', NEW.node_type,
936 + 'sample_hash', NEW.sample_hash, 'created_at', NEW.created_at));
937 + END;
938 +
939 + CREATE TRIGGER sync_vfs_nodes_delete AFTER DELETE ON vfs_nodes
940 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
941 + BEGIN
942 + INSERT INTO sync_changelog (table_name, op, row_id, data)
943 + VALUES ('vfs_nodes', 'DELETE', CAST(OLD.id AS TEXT), json_object('id', OLD.id));
944 + END;
945 +
946 + -- tags (composite PK: sample_hash + tag — both sensitive)
947 + CREATE TRIGGER sync_tags_insert AFTER INSERT ON tags
948 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
949 + BEGIN
950 + INSERT INTO sync_changelog (table_name, op, row_id, data)
951 + VALUES ('tags', 'INSERT',
952 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'),
953 + NEW.sample_hash || ':' || NEW.tag),
954 + json_object('sample_hash', NEW.sample_hash, 'tag', NEW.tag));
955 + END;
956 +
957 + CREATE TRIGGER sync_tags_delete AFTER DELETE ON tags
958 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
959 + BEGIN
960 + INSERT INTO sync_changelog (table_name, op, row_id, data)
961 + VALUES ('tags', 'DELETE',
962 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'),
963 + OLD.sample_hash || ':' || OLD.tag),
964 + json_object('sample_hash', OLD.sample_hash, 'tag', OLD.tag));
965 + END;
966 +
967 + -- collections (numeric PK)
968 + CREATE TRIGGER sync_collections_insert AFTER INSERT ON collections
969 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
970 + BEGIN
971 + INSERT INTO sync_changelog (table_name, op, row_id, data)
972 + VALUES ('collections', 'INSERT', CAST(NEW.id AS TEXT),
973 + json_object('id', NEW.id, 'name', NEW.name,
974 + 'description', NEW.description, 'created_at', NEW.created_at,
975 + 'filter_json', NEW.filter_json));
976 + END;
977 +
978 + CREATE TRIGGER sync_collections_update AFTER UPDATE ON collections
979 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
980 + BEGIN
981 + INSERT INTO sync_changelog (table_name, op, row_id, data)
982 + VALUES ('collections', 'UPDATE', CAST(NEW.id AS TEXT),
983 + json_object('id', NEW.id, 'name', NEW.name,
984 + 'description', NEW.description, 'created_at', NEW.created_at,
985 + 'filter_json', NEW.filter_json));
986 + END;
987 +
988 + CREATE TRIGGER sync_collections_delete AFTER DELETE ON collections
989 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
990 + BEGIN
991 + INSERT INTO sync_changelog (table_name, op, row_id, data)
992 + VALUES ('collections', 'DELETE', CAST(OLD.id AS TEXT), json_object('id', OLD.id));
993 + END;
994 +
995 + -- collection_members (composite PK: collection_id + sample_hash — hash is sensitive)
996 + CREATE TRIGGER sync_collection_members_insert AFTER INSERT ON collection_members
997 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
998 + BEGIN
999 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1000 + VALUES ('collection_members', 'INSERT',
1001 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'),
1002 + CAST(NEW.collection_id AS TEXT) || ':' || NEW.sample_hash),
1003 + json_object('collection_id', NEW.collection_id, 'sample_hash', NEW.sample_hash,
1004 + 'added_at', NEW.added_at));
1005 + END;
1006 +
1007 + CREATE TRIGGER sync_collection_members_delete AFTER DELETE ON collection_members
1008 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
1009 + BEGIN
1010 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1011 + VALUES ('collection_members', 'DELETE',
1012 + hash_row_id((SELECT value FROM sync_state WHERE key = 'row_id_salt'),
1013 + CAST(OLD.collection_id AS TEXT) || ':' || OLD.sample_hash),
1014 + json_object('collection_id', OLD.collection_id, 'sample_hash', OLD.sample_hash));
1015 + END;
1016 +
1017 + -- smart_folders table was dropped in M015; not recreating its triggers.
1018 +
1019 + -- user_config (key is app-defined closed set; not sensitive)
1020 + CREATE TRIGGER sync_user_config_insert AFTER INSERT ON user_config
1021 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
1022 + AND NEW.key NOT LIKE 'sync_%'
1023 + AND NEW.key != 'loose_files'
1024 + BEGIN
1025 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1026 + VALUES ('user_config', 'INSERT', NEW.key,
1027 + json_object('key', NEW.key, 'value', NEW.value));
1028 + END;
1029 +
1030 + CREATE TRIGGER sync_user_config_update AFTER UPDATE ON user_config
1031 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
1032 + AND NEW.key NOT LIKE 'sync_%'
1033 + AND NEW.key != 'loose_files'
1034 + BEGIN
1035 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1036 + VALUES ('user_config', 'UPDATE', NEW.key,
1037 + json_object('key', NEW.key, 'value', NEW.value));
1038 + END;
1039 +
1040 + CREATE TRIGGER sync_user_config_delete AFTER DELETE ON user_config
1041 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
1042 + AND OLD.key NOT LIKE 'sync_%'
1043 + AND OLD.key != 'loose_files'
1044 + BEGIN
1045 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1046 + VALUES ('user_config', 'DELETE', OLD.key, json_object('key', OLD.key));
1047 + END;
1048 +
1049 + -- edit_history (numeric PK)
1050 + CREATE TRIGGER sync_edit_history_insert AFTER INSERT ON edit_history
1051 + WHEN (SELECT value FROM sync_state WHERE key = 'applying_remote') != '1'
1052 + BEGIN
1053 + INSERT INTO sync_changelog (table_name, op, row_id, data)
1054 + VALUES ('edit_history', 'INSERT', CAST(NEW.id AS TEXT),
1055 + json_object('id', NEW.id, 'source_hash', NEW.source_hash,
1056 + 'result_hash', NEW.result_hash, 'operation', NEW.operation,
1057 + 'params_json', NEW.params_json, 'created_at', NEW.created_at));
1058 + END;
1059 + "#;
1060 +
1061 + /// Register `hash_row_id(salt, key) -> TEXT` as a deterministic SQLite
1062 + /// function on the given connection. Used by the M018 sync triggers so the
1063 + /// `sync_changelog.row_id` field never carries cleartext content (tag strings,
1064 + /// raw sample SHA-256s) on the wire. The salt is a per-user random nonce
1065 + /// stored in `sync_state` and never synced; without it, even a global rainbow
1066 + /// table over common tag strings would deanonymise users.
1067 + fn register_hash_row_id(conn: &Connection) -> Result<(), DbError> {
1068 + conn.create_scalar_function(
1069 + "hash_row_id",
1070 + 2,
1071 + FunctionFlags::SQLITE_DETERMINISTIC | FunctionFlags::SQLITE_UTF8,
1072 + |ctx| {
1073 + let salt: String = ctx.get(0)?;
1074 + let key: String = ctx.get(1)?;
1075 + let mut hasher = Sha256::new();
1076 + hasher.update(salt.as_bytes());
1077 + hasher.update(b":");
1078 + hasher.update(key.as_bytes());
1079 + let digest = hasher.finalize();
1080 + let mut hex = String::with_capacity(64);
1081 + for byte in digest {
1082 + use std::fmt::Write;
1083 + let _ = write!(hex, "{byte:02x}");
1084 + }
1085 + Ok(hex)
1086 + },
1087 + )?;
1088 + Ok(())
1089 + }
1090 +
711 1091 impl Database {
712 1092 /// Open (or create) the database at the given path and run migrations.
713 1093 #[instrument(skip_all)]
@@ -719,6 +1099,7 @@ impl Database {
719 1099 PRAGMA busy_timeout=5000;\
720 1100 PRAGMA wal_checkpoint(TRUNCATE);",
721 1101 )?;
1102 + register_hash_row_id(&conn)?;
722 1103 let mut db = Self { conn };
723 1104 db.migrate()?;
724 1105 Ok(db)
@@ -739,6 +1120,7 @@ impl Database {
739 1120 pub fn open_in_memory() -> Result<Self, DbError> {
740 1121 let conn = Connection::open_in_memory()?;
741 1122 conn.execute_batch("PRAGMA foreign_keys=ON;")?;
1123 + register_hash_row_id(&conn)?;
742 1124 let mut db = Self { conn };
743 1125 db.migrate()?;
744 1126 Ok(db)
@@ -773,6 +1155,7 @@ impl Database {
773 1155 MIGRATION_015,
774 1156 MIGRATION_016,
775 1157 MIGRATION_017,
1158 + MIGRATION_018,
776 1159 ];
777 1160
778 1161 for (i, sql) in MIGRATIONS.iter().enumerate() {
@@ -938,7 +1321,7 @@ mod tests {
938 1321 .conn()
939 1322 .query_row("PRAGMA user_version", [], |row| row.get(0))
940 1323 .unwrap();
941 - assert_eq!(version, 17);
1324 + assert_eq!(version, 18);
942 1325 }
943 1326
944 1327 #[test]
@@ -949,7 +1332,7 @@ mod tests {
949 1332 .conn()
950 1333 .query_row("PRAGMA user_version", [], |row| row.get(0))
951 1334 .unwrap();
952 - assert_eq!(version, 17);
1335 + assert_eq!(version, 18);
953 1336 }
954 1337
955 1338 /// Open a fresh file-backed DB, close, reopen. The second open re-enters
@@ -968,7 +1351,7 @@ mod tests {
968 1351 .conn()
969 1352 .query_row("PRAGMA user_version", [], |row| row.get(0))
970 1353 .unwrap();
971 - assert_eq!(version, 17);
1354 + assert_eq!(version, 18);
972 1355 }
973 1356
974 1357 /// Simulates the worst-case recovery path: a prior partial migration left
@@ -1001,7 +1384,108 @@ mod tests {
1001 1384 .conn()
1002 1385 .query_row("PRAGMA user_version", [], |row| row.get(0))
1003 1386 .unwrap();
1004 - assert_eq!(version, 17);
1387 + assert_eq!(version, 18);
1388 + }
1389 +
1390 + /// M018 contract: the `sync_changelog.row_id` for sensitive tables must
1391 + /// be a 64-hex SHA-256 (per `hash_row_id`), NOT the cleartext content
1392 + /// fingerprint or tag string. The cleartext key lives only in `data`.
1393 + /// This test is the regression gate for the upload audit fix.
1394 + #[test]
1395 + fn m018_hashes_sensitive_row_ids() {
1396 + let db = Database::open_in_memory().unwrap();
1397 + let conn = db.conn();
1398 +
1399 + // Seed: insert a sample and a tag. Both should fire triggers that
1400 + // write to sync_changelog with a hashed row_id.
1401 + conn.execute(
1402 + "INSERT INTO samples (hash, original_name, file_extension, file_size, \
1403 + import_date, last_modified) VALUES \
1404 + ('abc123', 'kick.wav', 'wav', 100, 0, 0)",
1405 + [],
1406 + )
1407 + .unwrap();
1408 + conn.execute(
1409 + "INSERT INTO tags (sample_hash, tag) VALUES ('abc123', 'drums')",
1410 + [],
1411 + )
1412 + .unwrap();
1413 +
1414 + // samples row_id: 64-hex hash, NOT "abc123".
1415 + let row_id: String = conn
1416 + .query_row(
1417 + "SELECT row_id FROM sync_changelog WHERE table_name = 'samples' AND op = 'INSERT'",
1418 + [],
1419 + |row| row.get(0),
1420 + )
1421 + .unwrap();
1422 + assert_eq!(row_id.len(), 64, "row_id should be SHA-256 hex");
1423 + assert!(row_id.chars().all(|c| c.is_ascii_hexdigit()));
1424 + assert_ne!(row_id, "abc123", "cleartext sample hash must not leak");
1425 +
1426 + // tags row_id: 64-hex hash, NOT "abc123:drums".
1427 + let row_id: String = conn
1428 + .query_row(
1429 + "SELECT row_id FROM sync_changelog WHERE table_name = 'tags' AND op = 'INSERT'",
1430 + [],
1431 + |row| row.get(0),
1432 + )
1433 + .unwrap();
1434 + assert_eq!(row_id.len(), 64);
1435 + assert_ne!(row_id, "abc123:drums", "cleartext tag string must not leak");
1436 +
1437 + // Salted: hash depends on the per-user salt, so two fresh DBs see
1438 + // different row_ids for the same logical key.
1439 + let db2 = Database::open_in_memory().unwrap();
1440 + let conn2 = db2.conn();
1441 + conn2
1442 + .execute(
1443 + "INSERT INTO samples (hash, original_name, file_extension, file_size, \
Lines truncated
@@ -49,11 +49,14 @@ pub(crate) fn apply_remote_changes(conn: &Connection, changes: &[ChangeEntry]) -
49 49 }
50 50 }
51 51
52 - // Apply deletes in reverse FK order
52 + // Apply deletes in reverse FK order. Pass `data` (decrypted by synckit)
53 + // so apply_delete can read canonical key fields from JSON — post-M018
54 + // sync_changelog DELETE rows hash the row_id, so the cleartext PK lives
55 + // only in `data`.
53 56 for table in DELETE_ORDER {
54 57 for change in &deletes {
55 58 if change.table == *table {
56 - apply_delete(&tx, table, &change.row_id)?;
59 + apply_delete(&tx, table, &change.row_id, change.data.as_ref())?;
57 60 count += 1;
58 61 }
59 62 }
@@ -145,14 +148,63 @@ pub(crate) fn apply_upsert(
145 148 }
146 149
147 150 /// Apply a delete for a single row, handling composite primary keys.
148 - pub(crate) fn apply_delete(conn: &Connection, table: &str, row_id: &str) -> Result<()> {
151 + ///
152 + /// Reads the canonical primary-key columns from the decrypted `data` JSON
153 + /// when present (the post-M018 path; row_id is then a hash of the key).
154 + /// Falls back to parsing `row_id` for backwards compatibility with rows
155 + /// pushed by pre-M018 clients, which set data=NULL on DELETE and packed
156 + /// the cleartext PK into row_id as `"pk1:pk2"`.
157 + pub(crate) fn apply_delete(
158 + conn: &Connection,
159 + table: &str,
160 + row_id: &str,
161 + data: Option<&serde_json::Value>,
162 + ) -> Result<()> {
149 163 let pks = pk_columns(table);
150 164
165 + // Prefer the `data` JSON: M018+ DELETE triggers always emit it, and it
166 + // carries the canonical key without depending on the wire-side row_id.
167 + if let Some(value) = data {
168 + if let Some(obj) = value.as_object() {
169 + let mut clauses = Vec::with_capacity(pks.len());
170 + let mut params: Vec<String> = Vec::with_capacity(pks.len());
171 + let mut missing = false;
172 + for (i, pk) in pks.iter().enumerate() {
173 + match obj.get(*pk) {
174 + Some(v) => {
175 + let s = match v {
176 + serde_json::Value::String(s) => s.clone(),
177 + serde_json::Value::Number(n) => n.to_string(),
178 + serde_json::Value::Null => {
179 + missing = true;
180 + break;
181 + }
182 + other => other.to_string(),
183 + };
184 + clauses.push(format!("{} = ?{}", pk, i + 1));
185 + params.push(s);
186 + }
187 + None => {
188 + missing = true;
189 + break;
190 + }
191 + }
192 + }
193 + if !missing && !clauses.is_empty() {
194 + let sql = format!("DELETE FROM {} WHERE {}", table, clauses.join(" AND "));
195 + let params_dyn: Vec<&dyn rusqlite::ToSql> =
196 + params.iter().map(|s| s as &dyn rusqlite::ToSql).collect();
197 + conn.execute(&sql, params_dyn.as_slice())?;
198 + return Ok(());
199 + }
200 + }
201 + }
202 +
203 + // Pre-M018 fallback: parse row_id as the literal PK or "pk1:pk2".
151 204 if pks.len() == 1 {
152 205 let sql = format!("DELETE FROM {} WHERE {} = ?1", table, pks[0]);
153 206 conn.execute(&sql, [row_id])?;
154 207 } else if pks.len() == 2 {
155 - // Composite PK: split row_id on first ':'
156 208 let (first, second) = match row_id.find(':') {
157 209 Some(pos) => (&row_id[..pos], &row_id[pos + 1..]),
158 210 None => {
@@ -309,8 +309,11 @@ mod tests {
309 309
310 310 assert_eq!(changelog_count(conn, Some("samples"), Some("INSERT")), 1);
311 311
312 + // Post-M018: row_id is `hash_row_id(salt, "abc123")` so we look up
313 + // by table+op and inspect the canonical hash inside the encrypted-
314 + // at-the-wire `data` field.
312 315 let data: String = conn.query_row(
313 - "SELECT data FROM sync_changelog WHERE table_name = 'samples' AND row_id = 'abc123'",
316 + "SELECT data FROM sync_changelog WHERE table_name = 'samples' AND op = 'INSERT'",
314 317 [],
315 318 |row| row.get(0),
316 319 ).unwrap();
@@ -351,12 +354,15 @@ mod tests {
351 354
352 355 assert_eq!(changelog_count(conn, Some("tags"), Some("INSERT")), 1);
353 356
354 - let row_id: String = conn.query_row(
355 - "SELECT row_id FROM sync_changelog WHERE table_name = 'tags'",
357 + // Post-M018: row_id is hashed; the cleartext key lives in `data`.
358 + let data: String = conn.query_row(
359 + "SELECT data FROM sync_changelog WHERE table_name = 'tags'",
356 360 [],
357 361 |row| row.get(0),
358 362 ).unwrap();
359 - assert_eq!(row_id, "hash1:drums");
363 + let parsed: serde_json::Value = serde_json::from_str(&data).unwrap();
364 + assert_eq!(parsed["sample_hash"], "hash1");
365 + assert_eq!(parsed["tag"], "drums");
360 366 }
361 367
362 368 #[test]
@@ -379,12 +385,14 @@ mod tests {
379 385
380 386 assert_eq!(changelog_count(conn, Some("collection_members"), Some("INSERT")), 1);
381 387
382 - let row_id: String = conn.query_row(
383 - "SELECT row_id FROM sync_changelog WHERE table_name = 'collection_members'",
388 + let data: String = conn.query_row(
389 + "SELECT data FROM sync_changelog WHERE table_name = 'collection_members'",
384 390 [],
385 391 |row| row.get(0),
386 392 ).unwrap();
387 - assert_eq!(row_id, format!("{}:hash2", collection_id));
393 + let parsed: serde_json::Value = serde_json::from_str(&data).unwrap();
394 + assert_eq!(parsed["collection_id"].as_i64().unwrap(), collection_id);
395 + assert_eq!(parsed["sample_hash"], "hash2");
388 396 }
389 397
390 398 // ── Trigger suppression ──
@@ -477,7 +485,7 @@ mod tests {
477 485 let conn = db.conn();
478 486 insert_sample(conn, "del_hash", "delete_me.wav", "wav");
479 487
480 - apply_delete(conn, "samples", "del_hash").unwrap();
488 + apply_delete(conn, "samples", "del_hash", None).unwrap();
481 489
482 490 let count: i64 = conn.query_row(
483 491 "SELECT COUNT(*) FROM samples WHERE hash = 'del_hash'",
@@ -497,7 +505,7 @@ mod tests {
497 505 [],
498 506 ).unwrap();
499 507
500 - apply_delete(conn, "tags", "tag_hash:bass").unwrap();
508 + apply_delete(conn, "tags", "tag_hash:bass", None).unwrap();
501 509
502 510 let count: i64 = conn.query_row(
503 511 "SELECT COUNT(*) FROM tags WHERE sample_hash = 'tag_hash' AND tag = 'bass'",
@@ -1347,7 +1355,7 @@ mod tests {
1347 1355 let conn = db.conn();
1348 1356
1349 1357 // Delete a hash that doesn't exist — should succeed (no-op)
1350 - let result = apply_delete(conn, "samples", "nonexistent_hash");
1358 + let result = apply_delete(conn, "samples", "nonexistent_hash", None);
1351 1359 assert!(result.is_ok());
1352 1360 }
1353 1361 }
M todo.md +10 -3
@@ -44,9 +44,16 @@ Launch shipped 2026-06-01 (see `/Users/max/Code/launchplan_final.md`). Post-laun
44 44
45 45 ## Future enhancements (not blocking)
46 46
47 - - [ ] Move About modal trigger into a Help menu (toolbar Help button is already there but routes only to keyboard-shortcuts overlay; add a sub-action for About).
48 - - [ ] `Cmd+,` for preferences as macOS users will expect (currently only Cmd/Ctrl+I → About → toggle).
49 - - [ ] SyncKit upload contract audit — what exactly gets uploaded vs encrypted; verify E2E boundary; audit `crates/audiofiles-sync/src/service/upload.rs`.
47 + - [x] **About menu + Cmd+, shortcut.** Toolbar Help is now a popup with "Keyboard shortcuts" and "About audiofiles". Browser→app signal via `BrowserState::about_requested` mirrors the MidiAction pattern. `Cmd+,` joins `Cmd+I` as the About toggle.
48 + - [x] **SyncKit upload contract audit + row_id hashing (M018).** Subagent-audited 2026-06-02. Findings: cleartext leaks in `sync_changelog.row_id` (tag strings as `sample_hash:tag`, raw sample SHA-256s as content fingerprints), plus a false privacy claim in `sync_panel.rs:402`. Landed:
49 + - Registered `hash_row_id(salt, key) -> TEXT` SQLite scalar function in `Database::open` / `open_in_memory`, using SHA-256 over a per-user salt + canonical key. Enabled rusqlite `functions` feature.
50 + - M018 migration: generates `row_id_salt` in `sync_state` (32-byte `randomblob` hex, INSERT OR IGNORE), drops and recreates every sync trigger with `hash_row_id` wrapping for sensitive tables (samples, audio_analysis, tags, collection_members). DELETE triggers now emit canonical-key JSON in `data` so pull-side replay doesn't depend on row_id semantics.
51 + - Backfills unpushed `sync_changelog` rows: hashes existing cleartext row_ids; reconstructs canonical PKs into `data` for DELETE rows on composite-PK tables before hashing.
52 + - `resolve::apply_delete` now reads composite PK from decrypted `data` JSON first, falls back to splitting `row_id` for pre-M018 rows already on the server.
53 + - Privacy copy at `sync_panel.rs:402` rewritten to match reality.
54 + - smart_folders triggers from M007 not recreated (table dropped in M015).
55 + - Tests: added `m018_hashes_sensitive_row_ids` (asserts 64-hex hash, no cleartext leak, salt differs across DBs) and `m018_delete_triggers_emit_canonical_key_in_data`. Updated 3 sync trigger tests to read from `data` rather than asserting cleartext `row_id`.
56 + - **Deferred:** server-side blob `hash` is still a content fingerprint per the audit's Risk 5; per-user blob namespace is a server change, out of scope for this client-side fix. Risk 4 (no tag UPDATE trigger) still applies but with hashing in place its impact is reduced to "one extra hashed-row push per rename" rather than "cleartext tag exposure twice".
50 57 - [x] **Database migration safety review** (2026-06-02). Subagent-audited all 17 migrations; per-migration risk grade in session transcript. Landed:
51 58 - Made M004, M005, M006, M007, M012 fully idempotent (`CREATE TABLE/INDEX/TRIGGER IF NOT EXISTS`; M007 seed `INSERT OR IGNORE`). M008–M011, M016, M017 were already replay-safe via `DROP IF EXISTS + CREATE`. M014 already idempotent.
52 59 - Added `migration_replay_from_file_no_op` and `migration_replay_from_version_two_against_full_schema` tests. The replay test rolls `PRAGMA user_version=2` against a populated schema and re-runs every migration from M003 onward; future non-idempotent CREATEs will fail this test loudly.