# audiofiles Database Schema SQLite schema reference. 19 inline migrations. Migrations are embedded as Rust string constants in `crates/audiofiles-core/src/db.rs` and applied via `PRAGMA user_version` tracking -- not separate SQL files. ## Domain Map | Domain | Tables | Purpose | |--------|--------|---------| | Samples | 1 | Content-addressed sample storage and metadata | | Analysis | 3 | Audio analysis, waveform data, fingerprints | | VFS | 2 | Virtual file system directories and nodes | | Organization | 3 | Tags, collections, collection members (smart folders merged into collections.filter_json in M015) | | Preferences | 1 | User configuration key-value store | | SyncKit | 2 | Cloud sync metadata and local changelog | | History | 1 | Destructive edit tracking | --- ## Samples ### samples Content-addressed sample storage. The hash (of file content) is the primary key -- duplicate files are impossible by design. | Column | Type | Notes | |--------|------|-------| | `hash` | TEXT PK | Content hash (SHA-256) | | `original_name` | TEXT | Filename at import | | `file_extension` | TEXT | e.g., `wav`, `mp3` | | `file_size` | INTEGER | Bytes | | `import_date` | INTEGER | Unix timestamp | | `last_modified` | INTEGER | Unix timestamp | | `cloud_only` | INTEGER | 1 when local blob deleted but exists in cloud (migration 008), default 0 | | `duration` | REAL | Seconds, available immediately after import (migration 009), nullable | | `source_path` | TEXT | Original on-disk path when imported in loose-files mode (migration 013), nullable | Index: `original_name`. --- ## Analysis ### audio_analysis Spectral, temporal, and ML analysis results. One row per sample. FK to samples via hash. | Column | Type | Notes | |--------|------|-------| | `hash` | TEXT PK FK | -> samples (CASCADE) | | `bpm` | REAL | Beats per minute, nullable | | `musical_key` | TEXT | Detected key, nullable | | `duration` | REAL | Seconds | | `sample_rate` | INTEGER | Hz | | `channels` | INTEGER | Mono=1, stereo=2 | | `peak_db` | REAL | Peak amplitude, nullable | | `rms_db` | REAL | RMS amplitude, nullable | | `is_loop` | INTEGER | Boolean, nullable | | `spectral_centroid` | REAL | Nullable | | `onset_strength` | REAL | Nullable | | `analyzed_at` | INTEGER | Unix timestamp | | `lufs` | REAL | Loudness (migration 003), nullable | | `spectral_flatness` | REAL | Migration 003, nullable | | `spectral_rolloff` | REAL | Migration 003, nullable | | `zero_crossing_rate` | REAL | Migration 003, nullable | | `classification` | TEXT | ML category label (migration 003), nullable | | `spectral_bandwidth` | REAL | Migration 010, nullable | | `centroid_variance` | REAL | Migration 010, nullable | | `crest_factor` | REAL | Migration 010, nullable | | `attack_time` | REAL | Migration 010, nullable | | `classification_confidence` | REAL | ML confidence score (migration 011), nullable | Indexes: `bpm`, `musical_key`, `duration`, `classification`. ### waveform_data Pre-computed waveform visualizations. Migration 004. | Column | Type | Notes | |--------|------|-------| | `hash` | TEXT PK FK | -> samples (CASCADE) | | `num_buckets` | INTEGER | Resolution | | `peak_data` | BLOB | Binary peak envelope | | `sample_rate` | INTEGER | Hz | | `duration` | REAL | Seconds | | `generated_at` | INTEGER | Unix timestamp | ### fingerprints Audio fingerprints for similarity search (VP-tree indexed). Migration 006. | Column | Type | Notes | |--------|------|-------| | `hash` | TEXT PK FK | -> samples (CASCADE) | | `envelope` | BLOB | Binary envelope data | | `sample_rate` | INTEGER | Hz | | `generated_at` | INTEGER | Unix timestamp | --- ## VFS ### vfs Virtual file systems. Users create named VFS instances to organize samples independently of disk layout. | Column | Type | Notes | |--------|------|-------| | `id` | INTEGER PK | | | `name` | TEXT | Unique | | `created_at` | INTEGER | Unix timestamp | | `modified_at` | INTEGER | Unix timestamp | | `sync_files` | INTEGER | Per-VFS toggle for syncing audio blobs to cloud (migration 007), default 0 | ### vfs_nodes Directory and sample nodes within a VFS. Self-referential tree structure. | Column | Type | Notes | |--------|------|-------| | `id` | INTEGER PK | | | `vfs_id` | INTEGER FK | -> vfs (CASCADE) | | `parent_id` | INTEGER FK | -> vfs_nodes (CASCADE), nullable (root nodes) | | `name` | TEXT | | | `node_type` | TEXT | CHECK: `directory` or `sample` | | `sample_hash` | TEXT FK | -> samples (CASCADE), nullable (directories) | | `created_at` | INTEGER | Unix timestamp | Unique: `(vfs_id, parent_id, name)`. Indexes: `parent_id`, `vfs_id`, `sample_hash`. --- ## Organization ### tags Flat dot-namespaced tags on samples. Migration 002 replaced the original key-value `tags` table with this simpler model. | Column | Type | Notes | |--------|------|-------| | `sample_hash` | TEXT FK | -> samples (CASCADE) | | `tag` | TEXT | e.g., `genre.techno`, `type.kick` | PK: `(sample_hash, tag)`. Indexes: `sample_hash`, `tag`. ### collections Named sample collections (playlists, kits). | Column | Type | Notes | |--------|------|-------| | `id` | INTEGER PK | | | `name` | TEXT | Unique | | `description` | TEXT | Nullable | | `created_at` | INTEGER | Unix timestamp | ### collection_members Samples within a collection. Many-to-many. | Column | Type | Notes | |--------|------|-------| | `collection_id` | INTEGER FK | -> collections (CASCADE) | | `sample_hash` | TEXT FK | -> samples (CASCADE) | | `added_at` | INTEGER | Unix timestamp | PK: `(collection_id, sample_hash)`. `collections.filter_json` (added in M015): when non-NULL, the collection is a dynamic / saved search; when NULL, it's a manual collection populated via `collection_members`. The standalone `smart_folders` table from M001 was dropped in M015 and migrated into this column. --- ## Preferences ### user_config Key-value preferences store. Migration 005. | Column | Type | Notes | |--------|------|-------| | `key` | TEXT PK | | | `value` | TEXT | | --- ## SyncKit ### sync_state Sync metadata key-value store. Migration 007. | Column | Type | Notes | |--------|------|-------| | `key` | TEXT PK | | | `value` | TEXT | | Seeded keys: `device_id`, `pull_cursor`, `auto_sync_enabled`, `sync_interval_minutes`, `applying_remote`, `last_sync_at`, `initial_snapshot_done`, `row_id_salt` (added in M018; 32 random bytes hex, never synced). ### sync_changelog Local change log for push/pull sync. Migration 007. | Column | Type | Notes | |--------|------|-------| | `id` | INTEGER PK | AUTOINCREMENT | | `table_name` | TEXT | Source table name | | `op` | TEXT | `INSERT`, `UPDATE`, or `DELETE` | | `row_id` | TEXT | Opaque per-row identifier. For sensitive tables (samples, audio_analysis, tags, collection_members) M018 sets this to `hash_row_id(row_id_salt, canonical_key)` so the cleartext key never goes on the wire. For numeric-PK tables it remains the literal id. | | `timestamp` | TEXT | ISO datetime, default `datetime('now')` | | `data` | TEXT | JSON snapshot of row, nullable | | `pushed` | INTEGER | Boolean, default 0 | Index: `pushed`. --- ## History ### edit_history Tracks destructive audio edits (trim, normalize, etc.) for future undo support. Migration 012. | Column | Type | Notes | |--------|------|-------| | `id` | INTEGER PK | AUTOINCREMENT | | `source_hash` | TEXT | Hash of original sample | | `result_hash` | TEXT | Hash of edited result | | `operation` | TEXT | Edit operation name | | `params_json` | TEXT | Operation parameters as JSON, nullable | | `created_at` | INTEGER | Unix timestamp, default `unixepoch()` | Indexes: `source_hash`, `result_hash`. --- ## Design Patterns - **Content-addressed storage:** `samples.hash` is a content hash -- the same file always produces the same PK, making deduplication automatic - **Inline migrations:** All schema DDL is embedded as Rust `const` strings in `db.rs`, applied transactionally via `PRAGMA user_version` -- no external SQL files - **VFS abstraction:** Virtual file systems decouple organization from disk layout; one sample can appear in multiple VFS trees via `vfs_nodes.sample_hash` - **Self-referential tree:** `vfs_nodes.parent_id` references `vfs_nodes.id` for arbitrary directory nesting - **Dot-namespaced tags:** Flat `tag` strings with dot convention (e.g., `genre.techno`) replace the original key-value tag model (migration 002) - **Sync guard triggers:** All sync triggers check `applying_remote != '1'` to prevent echo loops - **Sync-excluded keys:** `user_config` sync triggers skip keys matching `sync_%` to avoid syncing sync-internal state - **Cloud-only samples:** `samples.cloud_only` flag allows local blob eviction while keeping metadata and cloud copy - **Hashed row IDs (M018):** sensitive `sync_changelog.row_id` values go through `hash_row_id(row_id_salt, canonical_key)` so the server never sees raw sample hashes or tag strings. DELETE triggers also emit the canonical PK into the encrypted `data` field so pull-side replay doesn't need to parse row_id. - **Synced tables:** `samples`, `audio_analysis`, `vfs`, `vfs_nodes`, `tags`, `collections`, `collection_members`, `user_config`, `edit_history` ## Key Indexes - **Content lookup** on samples (original_name), tags (sample_hash, tag) - **Analysis** on audio_analysis (bpm, musical_key, duration, classification) - **VFS tree** on vfs_nodes (parent_id, vfs_id, sample_hash) - **History** on edit_history (source_hash, result_hash) - **Sync** on sync_changelog (pushed) for pending push detection ## Schema Evolution | Migration | Change | |-----------|--------| | 001 | Initial schema (samples, audio_analysis, vfs, vfs_nodes, tags, collections, collection_members, smart_folders) | | 002 | Tags v2: replaced key-value (tag_name, tag_value) with flat dot-namespaced tags | | 003 | Extended analysis: lufs, spectral_flatness, spectral_rolloff, zero_crossing_rate, classification | | 004 | Waveform data table, additional analysis + sample indexes | | 005 | User config key-value store | | 006 | Fingerprints table (audio similarity search) | | 007 | SyncKit integration (sync_state, sync_changelog, sync triggers on all domain tables, vfs.sync_files) | | 008 | Cloud-only samples (cloud_only column, updated sync triggers) | | 009 | Duration on samples table (available before analysis) | | 010 | Extended spectral features (spectral_bandwidth, centroid_variance, crest_factor, attack_time) | | 011 | ML classification confidence score | | 012 | Edit history table (destructive edit tracking) | | 013 | `samples.source_path` for loose-files mode (sample lives at its original on-disk path) | | 014 | Unique partial index on `vfs_nodes(vfs_id, name) WHERE parent_id IS NULL` (prevent duplicate root nodes) | | 015 | Collections gain `filter_json`; `smart_folders` table merged into collections and dropped | | 016 | Exclude `loose_files` user_config key from sync (security: server can't flip the mode) | | 017 | Rename `unsafe_mode` user_config key to `loose_files` (trigger recreation only; row-copy lives in main.rs) | | 018 | Hash `sync_changelog.row_id` for sensitive tables; DELETE triggers emit canonical PK in `data`; per-user `row_id_salt` in sync_state |