| 1 |
# audiofiles Database Schema |
| 2 |
|
| 3 |
SQLite schema reference. 19 inline migrations. Migrations are embedded as Rust string constants in `crates/audiofiles-core/src/db.rs` and applied via `PRAGMA user_version` tracking -- not separate SQL files. |
| 4 |
|
| 5 |
## Domain Map |
| 6 |
|
| 7 |
|
| 8 |
|
| 9 |
| Samples | 1 | Content-addressed sample storage and metadata | |
| 10 |
| Analysis | 3 | Audio analysis, waveform data, fingerprints | |
| 11 |
| VFS | 2 | Virtual file system directories and nodes | |
| 12 |
| Organization | 3 | Tags, collections, collection members (smart folders merged into collections.filter_json in M015) | |
| 13 |
| Preferences | 1 | User configuration key-value store | |
| 14 |
| SyncKit | 2 | Cloud sync metadata and local changelog | |
| 15 |
| History | 1 | Destructive edit tracking | |
| 16 |
|
| 17 |
--- |
| 18 |
|
| 19 |
## Samples |
| 20 |
|
| 21 |
### samples |
| 22 |
Content-addressed sample storage. The hash (of file content) is the primary key -- duplicate files are impossible by design. |
| 23 |
|
| 24 |
|
| 25 |
|
| 26 |
| `hash` | TEXT PK | Content hash (SHA-256) | |
| 27 |
| `original_name` | TEXT | Filename at import | |
| 28 |
| `file_extension` | TEXT | e.g., `wav`, `mp3` | |
| 29 |
| `file_size` | INTEGER | Bytes | |
| 30 |
| `import_date` | INTEGER | Unix timestamp | |
| 31 |
| `last_modified` | INTEGER | Unix timestamp | |
| 32 |
| `cloud_only` | INTEGER | 1 when local blob deleted but exists in cloud (migration 008), default 0 | |
| 33 |
| `duration` | REAL | Seconds, available immediately after import (migration 009), nullable | |
| 34 |
| `source_path` | TEXT | Original on-disk path when imported in loose-files mode (migration 013), nullable | |
| 35 |
|
| 36 |
Index: `original_name`. |
| 37 |
|
| 38 |
--- |
| 39 |
|
| 40 |
## Analysis |
| 41 |
|
| 42 |
### audio_analysis |
| 43 |
Spectral, temporal, and ML analysis results. One row per sample. FK to samples via hash. |
| 44 |
|
| 45 |
|
| 46 |
|
| 47 |
| `hash` | TEXT PK FK | -> samples (CASCADE) | |
| 48 |
| `bpm` | REAL | Beats per minute, nullable | |
| 49 |
| `musical_key` | TEXT | Detected key, nullable | |
| 50 |
| `duration` | REAL | Seconds | |
| 51 |
| `sample_rate` | INTEGER | Hz | |
| 52 |
| `channels` | INTEGER | Mono=1, stereo=2 | |
| 53 |
| `peak_db` | REAL | Peak amplitude, nullable | |
| 54 |
| `rms_db` | REAL | RMS amplitude, nullable | |
| 55 |
| `is_loop` | INTEGER | Boolean, nullable | |
| 56 |
| `spectral_centroid` | REAL | Nullable | |
| 57 |
| `onset_strength` | REAL | Nullable | |
| 58 |
| `analyzed_at` | INTEGER | Unix timestamp | |
| 59 |
| `lufs` | REAL | Loudness (migration 003), nullable | |
| 60 |
| `spectral_flatness` | REAL | Migration 003, nullable | |
| 61 |
| `spectral_rolloff` | REAL | Migration 003, nullable | |
| 62 |
| `zero_crossing_rate` | REAL | Migration 003, nullable | |
| 63 |
| `classification` | TEXT | ML category label (migration 003), nullable | |
| 64 |
| `spectral_bandwidth` | REAL | Migration 010, nullable | |
| 65 |
| `centroid_variance` | REAL | Migration 010, nullable | |
| 66 |
| `crest_factor` | REAL | Migration 010, nullable | |
| 67 |
| `attack_time` | REAL | Migration 010, nullable | |
| 68 |
| `classification_confidence` | REAL | ML confidence score (migration 011), nullable | |
| 69 |
|
| 70 |
Indexes: `bpm`, `musical_key`, `duration`, `classification`. |
| 71 |
|
| 72 |
### waveform_data |
| 73 |
Pre-computed waveform visualizations. Migration 004. |
| 74 |
|
| 75 |
|
| 76 |
|
| 77 |
| `hash` | TEXT PK FK | -> samples (CASCADE) | |
| 78 |
| `num_buckets` | INTEGER | Resolution | |
| 79 |
| `peak_data` | BLOB | Binary peak envelope | |
| 80 |
| `sample_rate` | INTEGER | Hz | |
| 81 |
| `duration` | REAL | Seconds | |
| 82 |
| `generated_at` | INTEGER | Unix timestamp | |
| 83 |
|
| 84 |
### fingerprints |
| 85 |
Audio fingerprints for similarity search (VP-tree indexed). Migration 006. |
| 86 |
|
| 87 |
|
| 88 |
|
| 89 |
| `hash` | TEXT PK FK | -> samples (CASCADE) | |
| 90 |
| `envelope` | BLOB | Binary envelope data | |
| 91 |
| `sample_rate` | INTEGER | Hz | |
| 92 |
| `generated_at` | INTEGER | Unix timestamp | |
| 93 |
|
| 94 |
--- |
| 95 |
|
| 96 |
## VFS |
| 97 |
|
| 98 |
### vfs |
| 99 |
Virtual file systems. Users create named VFS instances to organize samples independently of disk layout. |
| 100 |
|
| 101 |
|
| 102 |
|
| 103 |
| `id` | INTEGER PK | | |
| 104 |
| `name` | TEXT | Unique | |
| 105 |
| `created_at` | INTEGER | Unix timestamp | |
| 106 |
| `modified_at` | INTEGER | Unix timestamp | |
| 107 |
| `sync_files` | INTEGER | Per-VFS toggle for syncing audio blobs to cloud (migration 007), default 0 | |
| 108 |
|
| 109 |
### vfs_nodes |
| 110 |
Directory and sample nodes within a VFS. Self-referential tree structure. |
| 111 |
|
| 112 |
|
| 113 |
|
| 114 |
| `id` | INTEGER PK | | |
| 115 |
| `vfs_id` | INTEGER FK | -> vfs (CASCADE) | |
| 116 |
| `parent_id` | INTEGER FK | -> vfs_nodes (CASCADE), nullable (root nodes) | |
| 117 |
| `name` | TEXT | | |
| 118 |
| `node_type` | TEXT | CHECK: `directory` or `sample` | |
| 119 |
| `sample_hash` | TEXT FK | -> samples (CASCADE), nullable (directories) | |
| 120 |
| `created_at` | INTEGER | Unix timestamp | |
| 121 |
|
| 122 |
Unique: `(vfs_id, parent_id, name)`. Indexes: `parent_id`, `vfs_id`, `sample_hash`. |
| 123 |
|
| 124 |
--- |
| 125 |
|
| 126 |
## Organization |
| 127 |
|
| 128 |
### tags |
| 129 |
Flat dot-namespaced tags on samples. Migration 002 replaced the original key-value `tags` table with this simpler model. |
| 130 |
|
| 131 |
|
| 132 |
|
| 133 |
| `sample_hash` | TEXT FK | -> samples (CASCADE) | |
| 134 |
| `tag` | TEXT | e.g., `genre.techno`, `type.kick` | |
| 135 |
|
| 136 |
PK: `(sample_hash, tag)`. Indexes: `sample_hash`, `tag`. |
| 137 |
|
| 138 |
### collections |
| 139 |
Named sample collections (playlists, kits). |
| 140 |
|
| 141 |
|
| 142 |
|
| 143 |
| `id` | INTEGER PK | | |
| 144 |
| `name` | TEXT | Unique | |
| 145 |
| `description` | TEXT | Nullable | |
| 146 |
| `created_at` | INTEGER | Unix timestamp | |
| 147 |
|
| 148 |
### collection_members |
| 149 |
Samples within a collection. Many-to-many. |
| 150 |
|
| 151 |
|
| 152 |
|
| 153 |
| `collection_id` | INTEGER FK | -> collections (CASCADE) | |
| 154 |
| `sample_hash` | TEXT FK | -> samples (CASCADE) | |
| 155 |
| `added_at` | INTEGER | Unix timestamp | |
| 156 |
|
| 157 |
PK: `(collection_id, sample_hash)`. |
| 158 |
|
| 159 |
`collections.filter_json` (added in M015): when non-NULL, the collection is a dynamic / saved search; when NULL, it's a manual collection populated via `collection_members`. The standalone `smart_folders` table from M001 was dropped in M015 and migrated into this column. |
| 160 |
|
| 161 |
--- |
| 162 |
|
| 163 |
## Preferences |
| 164 |
|
| 165 |
### user_config |
| 166 |
Key-value preferences store. Migration 005. |
| 167 |
|
| 168 |
|
| 169 |
|
| 170 |
| `key` | TEXT PK | | |
| 171 |
| `value` | TEXT | | |
| 172 |
|
| 173 |
--- |
| 174 |
|
| 175 |
## SyncKit |
| 176 |
|
| 177 |
### sync_state |
| 178 |
Sync metadata key-value store. Migration 007. |
| 179 |
|
| 180 |
|
| 181 |
|
| 182 |
| `key` | TEXT PK | | |
| 183 |
| `value` | TEXT | | |
| 184 |
|
| 185 |
Seeded keys: `device_id`, `pull_cursor`, `auto_sync_enabled`, `sync_interval_minutes`, `applying_remote`, `last_sync_at`, `initial_snapshot_done`, `row_id_salt` (added in M018; 32 random bytes hex, never synced). |
| 186 |
|
| 187 |
### sync_changelog |
| 188 |
Local change log for push/pull sync. Migration 007. |
| 189 |
|
| 190 |
|
| 191 |
|
| 192 |
| `id` | INTEGER PK | AUTOINCREMENT | |
| 193 |
| `table_name` | TEXT | Source table name | |
| 194 |
| `op` | TEXT | `INSERT`, `UPDATE`, or `DELETE` | |
| 195 |
| `row_id` | TEXT | Opaque per-row identifier. For sensitive tables (samples, audio_analysis, tags, collection_members) M018 sets this to `hash_row_id(row_id_salt, canonical_key)` so the cleartext key never goes on the wire. For numeric-PK tables it remains the literal id. | |
| 196 |
| `timestamp` | TEXT | ISO datetime, default `datetime('now')` | |
| 197 |
| `data` | TEXT | JSON snapshot of row, nullable | |
| 198 |
| `pushed` | INTEGER | Boolean, default 0 | |
| 199 |
|
| 200 |
Index: `pushed`. |
| 201 |
|
| 202 |
--- |
| 203 |
|
| 204 |
## History |
| 205 |
|
| 206 |
### edit_history |
| 207 |
Tracks destructive audio edits (trim, normalize, etc.) for future undo support. Migration 012. |
| 208 |
|
| 209 |
|
| 210 |
|
| 211 |
| `id` | INTEGER PK | AUTOINCREMENT | |
| 212 |
| `source_hash` | TEXT | Hash of original sample | |
| 213 |
| `result_hash` | TEXT | Hash of edited result | |
| 214 |
| `operation` | TEXT | Edit operation name | |
| 215 |
| `params_json` | TEXT | Operation parameters as JSON, nullable | |
| 216 |
| `created_at` | INTEGER | Unix timestamp, default `unixepoch()` | |
| 217 |
|
| 218 |
Indexes: `source_hash`, `result_hash`. |
| 219 |
|
| 220 |
--- |
| 221 |
|
| 222 |
## Design Patterns |
| 223 |
|
| 224 |
- **Content-addressed storage:** `samples.hash` is a content hash -- the same file always produces the same PK, making deduplication automatic |
| 225 |
- **Inline migrations:** All schema DDL is embedded as Rust `const` strings in `db.rs`, applied transactionally via `PRAGMA user_version` -- no external SQL files |
| 226 |
- **VFS abstraction:** Virtual file systems decouple organization from disk layout; one sample can appear in multiple VFS trees via `vfs_nodes.sample_hash` |
| 227 |
- **Self-referential tree:** `vfs_nodes.parent_id` references `vfs_nodes.id` for arbitrary directory nesting |
| 228 |
- **Dot-namespaced tags:** Flat `tag` strings with dot convention (e.g., `genre.techno`) replace the original key-value tag model (migration 002) |
| 229 |
- **Sync guard triggers:** All sync triggers check `applying_remote != '1'` to prevent echo loops |
| 230 |
- **Sync-excluded keys:** `user_config` sync triggers skip keys matching `sync_%` to avoid syncing sync-internal state |
| 231 |
- **Cloud-only samples:** `samples.cloud_only` flag allows local blob eviction while keeping metadata and cloud copy |
| 232 |
- **Hashed row IDs (M018):** sensitive `sync_changelog.row_id` values go through `hash_row_id(row_id_salt, canonical_key)` so the server never sees raw sample hashes or tag strings. DELETE triggers also emit the canonical PK into the encrypted `data` field so pull-side replay doesn't need to parse row_id. |
| 233 |
- **Synced tables:** `samples`, `audio_analysis`, `vfs`, `vfs_nodes`, `tags`, `collections`, `collection_members`, `user_config`, `edit_history` |
| 234 |
|
| 235 |
## Key Indexes |
| 236 |
|
| 237 |
- **Content lookup** on samples (original_name), tags (sample_hash, tag) |
| 238 |
- **Analysis** on audio_analysis (bpm, musical_key, duration, classification) |
| 239 |
- **VFS tree** on vfs_nodes (parent_id, vfs_id, sample_hash) |
| 240 |
- **History** on edit_history (source_hash, result_hash) |
| 241 |
- **Sync** on sync_changelog (pushed) for pending push detection |
| 242 |
|
| 243 |
## Schema Evolution |
| 244 |
|
| 245 |
|
| 246 |
|
| 247 |
| 001 | Initial schema (samples, audio_analysis, vfs, vfs_nodes, tags, collections, collection_members, smart_folders) | |
| 248 |
| 002 | Tags v2: replaced key-value (tag_name, tag_value) with flat dot-namespaced tags | |
| 249 |
| 003 | Extended analysis: lufs, spectral_flatness, spectral_rolloff, zero_crossing_rate, classification | |
| 250 |
| 004 | Waveform data table, additional analysis + sample indexes | |
| 251 |
| 005 | User config key-value store | |
| 252 |
| 006 | Fingerprints table (audio similarity search) | |
| 253 |
| 007 | SyncKit integration (sync_state, sync_changelog, sync triggers on all domain tables, vfs.sync_files) | |
| 254 |
| 008 | Cloud-only samples (cloud_only column, updated sync triggers) | |
| 255 |
| 009 | Duration on samples table (available before analysis) | |
| 256 |
| 010 | Extended spectral features (spectral_bandwidth, centroid_variance, crest_factor, attack_time) | |
| 257 |
| 011 | ML classification confidence score | |
| 258 |
| 012 | Edit history table (destructive edit tracking) | |
| 259 |
| 013 | `samples.source_path` for loose-files mode (sample lives at its original on-disk path) | |
| 260 |
| 014 | Unique partial index on `vfs_nodes(vfs_id, name) WHERE parent_id IS NULL` (prevent duplicate root nodes) | |
| 261 |
| 015 | Collections gain `filter_json`; `smart_folders` table merged into collections and dropped | |
| 262 |
| 016 | Exclude `loose_files` user_config key from sync (security: server can't flip the mode) | |
| 263 |
| 017 | Rename `unsafe_mode` user_config key to `loose_files` (trigger recreation only; row-copy lives in main.rs) | |
| 264 |
| 018 | Hash `sync_changelog.row_id` for sensitive tables; DELETE triggers emit canonical PK in `data`; per-user `row_id_salt` in sync_state | |
| 265 |
|