| 1 |
# audiofiles Architecture |
| 2 |
|
| 3 |
## System Overview |
| 4 |
|
| 5 |
audiofiles is a standalone desktop sample manager built with egui. Samples are stored in a content-addressed blob store keyed by SHA-256 hash, and organized through a virtual filesystem (VFS) that maps user-visible directory trees to underlying content hashes. The database (SQLite) tracks metadata, analysis results, tags, collections, and VFS structure. The app provides cpal audio output, system tray integration, and cloud sync via SyncKit. |
| 6 |
|
| 7 |
## Workspace Layout |
| 8 |
|
| 9 |
The project is a 5-crate Rust workspace (plus an optional training binary): |
| 10 |
|
| 11 |
### audiofiles-core |
| 12 |
|
| 13 |
Pure data layer with no UI or async dependencies. Owns the SQLite database schema (versioned migrations), content-addressed `SampleStore`, VFS tree operations, tag management, search/filter, smart folders, audio analysis pipeline (BPM, key, loudness, spectral, classification, loop detection), fingerprinting for duplicate detection, similarity search, instrument zone definitions, and export logic including device profile support. All operations are synchronous. This crate is the single source of truth for business logic. |
| 14 |
|
| 15 |
### audiofiles-browser |
| 16 |
|
| 17 |
Shared egui UI. Contains the `Backend` trait abstraction, `BrowserState` (the central state machine), the `editor` module (top-level draw dispatch), and all UI panels: file list, detail pane, filter sidebar, toolbar, footer, waveform display, import/export workflow screens, instrument panel, sync panel, overlays, and the theme system. Also owns the `PreviewPlayback` shared audio state and the background import worker. |
| 18 |
|
| 19 |
### audiofiles-app |
| 20 |
|
| 21 |
Desktop binary. Launches an eframe window with the browser UI and a cpal audio output stream for preview playback. Adds a system tray icon, drag-and-drop file import, CLI argument import, and optional SyncKit cloud sync (configured via environment variables). Owns a tokio runtime for async sync operations. |
| 22 |
|
| 23 |
### audiofiles-sync |
| 24 |
|
| 25 |
SyncKit integration layer. Wraps the `synckit-client` SDK to provide `SyncManager`, which the standalone app owns and the browser UI reads for status display. Handles OAuth2 PKCE authentication, E2E encryption setup (ChaCha20-Poly1305 + Argon2), session restore from keychain, push/pull changelog sync, and a background scheduler that runs periodic sync cycles on a tokio runtime. |
| 26 |
|
| 27 |
### audiofiles-rhai |
| 28 |
|
| 29 |
Device plugin runtime. Loads TOML manifests (bundled at compile time and user-supplied from `~/.config/audiofiles/plugins/user/`) describing hardware sampler constraints (supported formats, sample rates, bit depths, channel counts, naming rules). Optionally runs sandboxed Rhai scripts for custom export hooks: `validate_sample`, `transform_filename`, `pre_export`, and `post_export`. The plugin registry resolves device profiles for export configuration. |
| 30 |
|
| 31 |
## Backend Trait Abstraction |
| 32 |
|
| 33 |
The `Backend` trait in `audiofiles-browser::backend` defines every operation that `BrowserState` needs from the data layer: VFS CRUD, tag management, search, smart folders, analysis, similarity, store operations, export, device profiles, config, and long-running background operations (import, analysis, export) with event polling. |
| 34 |
|
| 35 |
One implementation exists: |
| 36 |
|
| 37 |
- **DirectBackend** -- wraps `Mutex<Database>` + `SampleStore`, calls core functions directly. Used in the app and tests. |
| 38 |
|
| 39 |
This trait boundary keeps the browser UI decoupled from the data layer. |
| 40 |
|
| 41 |
## Content-Addressed Storage |
| 42 |
|
| 43 |
Every imported audio file is hashed with SHA-256 and stored at `<data_dir>/samples/<hash>.<ext>`. The `SampleStore` handles: |
| 44 |
|
| 45 |
1. **Import**: Stream file through SHA-256 hasher, copy to content-addressed path if not already present, insert metadata row (original name, extension, file size, timestamps) into the `samples` table with `INSERT OR IGNORE` for deduplication. |
| 46 |
2. **Lookup**: Given a hash and extension, resolve to a filesystem path. Hash validation (64 lowercase hex characters) prevents directory traversal. |
| 47 |
3. **Verification**: Re-hash a stored file and compare against the expected hash to detect corruption. |
| 48 |
4. **Removal**: Delete the on-disk blob and the database row (CASCADE handles VFS nodes, tags, analysis, etc.). |
| 49 |
|
| 50 |
The VFS layer maps user-visible paths to content hashes. A single sample blob can appear in multiple VFS locations without duplication. VFS nodes are either directories (no hash) or sample links (referencing a hash in the `samples` table). |
| 51 |
|
| 52 |
## Database Schema |
| 53 |
|
| 54 |
SQLite with 19 versioned migrations: |
| 55 |
|
| 56 |
|
| 57 |
|
| 58 |
| `samples` | Content-addressed sample metadata (hash PK, original name, extension, size, timestamps) | |
| 59 |
| `audio_analysis` | Per-sample analysis results (BPM, key, duration, sample rate, channels, loudness, spectral features, classification) | |
| 60 |
| `vfs` | Virtual filesystem roots (name, timestamps, sync_files flag) | |
| 61 |
| `vfs_nodes` | Tree nodes (directory or sample link, parent reference, unique name per parent) | |
| 62 |
| `tags` | Flat tag strings per sample hash (simplified from original name/value model) | |
| 63 |
| `collections` / `collection_members` | Named sample playlists | |
| 64 |
| `smart_folders` | Saved search queries (JSON filter per VFS) | |
| 65 |
| `waveform_data` | Pre-computed waveform display data (bucketed peaks) | |
| 66 |
| `fingerprints` | Audio envelope fingerprints for duplicate detection | |
| 67 |
| `user_config` | Key-value user preferences | |
| 68 |
| `sync_state` | SyncKit metadata (device ID, cursors, settings) | |
| 69 |
| `sync_changelog` | Local change log for push/pull sync, populated by triggers | |
| 70 |
|
| 71 |
Foreign keys with CASCADE ensure referential integrity. Triggers on `samples`, `vfs`, `vfs_nodes`, `tags`, `audio_analysis`, `collections`, `collection_members`, `smart_folders`, and `user_config` record changes to `sync_changelog` for sync, gated by the `applying_remote` flag to avoid echoing pulled changes back. |
| 72 |
|
| 73 |
## Analysis Pipeline |
| 74 |
|
| 75 |
The analysis system in `audiofiles-core::analysis` decodes audio to mono f32 via Symphonia, then runs configurable stages: |
| 76 |
|
| 77 |
- **Basic**: Duration, sample rate, channel count -- always computed. |
| 78 |
- **Loudness**: Peak dBFS, RMS dB, integrated LUFS (via bs1770). |
| 79 |
- **Spectral**: Centroid, flatness, rolloff, zero-crossing rate, bandwidth, centroid variance (via FFT with realfft). |
| 80 |
- **Waveform**: Crest factor (peak/RMS ratio), attack time (seconds to 90% of peak via 1ms RMS envelope). |
| 81 |
- **BPM**: Onset-based tempo estimation using spectral flux and autocorrelation. |
| 82 |
- **Key**: Musical key detection from chroma features. |
| 83 |
- **MFCC**: Mel-Frequency Cepstral Coefficients computed from existing STFT magnitudes (26-band mel filterbank, log energy, DCT-II, 13 coefficients). Aggregated as mean + variance across frames (26 features total). |
| 84 |
- **Classification**: Two-layer ML system mapping 35-feature vectors (9 spectral/waveform + 26 MFCC) to 16 sample categories (kick, snare, hihat, cymbal, percussion, bass, vocal, synth, pad, fx, noise, music, ambience, impact, foley, texture). Layer 1: rule-based broad classifier detects drums vs non-drum categories. Layer 2: 200-tree Random Forest (4.0MB, embedded via `include_bytes!`, `OnceLock` lazy init) for drum sub-classification (kick/snare/hihat/cymbal/percussion). Confidence scores from RF vote fraction. 94.4% strict accuracy on 4343 labeled drum samples. Training binary in `audiofiles-train` crate (not built by default). |
| 85 |
- **Loop detection**: Identifies whether a sample is a seamless loop. |
| 86 |
- **Fingerprinting**: Computes an amplitude envelope fingerprint for near-duplicate detection across the library. |
| 87 |
- **Tag suggestion**: Generates tag suggestions from analysis results (classification, BPM range, key, duration bracket) with confidence scores and human-readable reasons. |
| 88 |
|
| 89 |
Analysis runs in a background worker thread using rayon for parallel processing. A configurable analysis cap (`max_analysis_seconds`, default 30s) limits expensive operations (STFT, BPM/key) to the first N seconds of audio while cheap operations (peak/RMS, fingerprint) use the full signal. An `AtomicBool` cancel flag allows interrupting in-flight parallel work. |
| 90 |
|
| 91 |
## Export System |
| 92 |
|
| 93 |
Export converts VFS subtrees into standalone file hierarchies on disk. The pipeline: |
| 94 |
|
| 95 |
1. **Collect items**: Walk the VFS subtree, resolving each sample link to its content-addressed blob path and enriching with tags. |
| 96 |
2. **Configure**: User selects format (original, WAV, AIFF), sample rate, bit depth, channel configuration, structure (preserve tree or flatten), naming pattern (with tokens like `{name}`, `{bpm}`, `{key}`, `{class}`), metadata sidecar, and destination directory. |
| 97 |
3. **Device profiles**: Optionally select a hardware sampler profile (from the Rhai plugin registry) which pre-fills format constraints and may run custom hook scripts during export. |
| 98 |
4. **Execute**: Background worker copies or transcodes each file, applying format conversion (via hound + rubato for resampling), channel conversion (mono/stereo), and naming rules. Progress is reported per file. Each output is written atomically via a `write_atomic(dest, |tmp| ...)` helper — the encoder/copier targets `dest.audiofiles_tmp`, then `fs::rename`s into place on success. A killed export never leaves a partial file in the user's export directory. |
| 99 |
|
| 100 |
## Instrument Engine |
| 101 |
|
| 102 |
audiofiles includes a polyphonic sampler instrument for keyboard-driven sample playback. The instrument system spans two crates: |
| 103 |
|
| 104 |
- **audiofiles-core::instrument** defines `InstrumentConfig` and `AdsrEnvelope` (attack, decay, sustain, release parameters). |
| 105 |
- **audiofiles-browser::instrument** owns the runtime state: `InstrumentPlayback` (voice pool, loaded zones, sample rate), `Voice` (per-voice ADSR tracking, fractional position for pitch interpolation), and `LoadedZone` (decoded buffer with root note, key range, velocity range). |
| 106 |
|
| 107 |
Pitch shifting uses semitone-based ratio calculation: `2^(semitone_offset / 12) * (source_rate / host_rate)`. The fractional sample position advances by this ratio per output sample, with linear interpolation between adjacent source samples. |
| 108 |
|
| 109 |
## Theme System |
| 110 |
|
| 111 |
The browser supports multiple color themes via a 14-slot palette (4 background, 3 foreground, 6 accent, 1 border). Themes are defined as TOML files with `[meta]`, `[background]`, `[foreground]`, `[accent]`, and `[border]` sections. Bundled themes live in `crates/audiofiles-browser/themes/` and are compiled in at build time; users can add custom themes to `<config>/audiofiles/themes/` which override bundled themes by ID. |
| 112 |
|
| 113 |
The active theme is stored in a global `RwLock<ThemeColors>`. Public accessor functions (e.g., `bg_primary()`, `text_secondary()`, `accent_blue()`) read through the lock. Derived colors (row stripes, selection highlights, hover states) are computed by linear interpolation between base palette slots. |
| 114 |
|
| 115 |
Sample classification colors are hardcoded (not theme-driven) so that semantic categories maintain consistent visual identity across themes. |
| 116 |
|
| 117 |
## Audio Thread Model |
| 118 |
|
| 119 |
The cpal audio output callback runs on a real-time thread where blocking should be avoided. The design uses `parking_lot::Mutex` with `try_lock`: |
| 120 |
|
| 121 |
- **Preview playback**: The GUI thread decodes audio and writes the buffer into `PreviewPlayback` behind a mutex. The cpal callback calls `try_lock()` -- if the GUI holds the lock (decoding), the callback outputs silence instead of blocking. |
| 122 |
|
| 123 |
Voice stealing in the instrument engine uses a monotonic age counter to find the oldest voice without sorting. |
| 124 |
|
| 125 |
## Sync Integration |
| 126 |
|
| 127 |
The app optionally connects to MakeNotWork's SyncKit service for cloud sync of sample metadata. `SyncManager` coordinates: |
| 128 |
|
| 129 |
1. **Authentication**: OAuth2 PKCE flow via browser redirect. |
| 130 |
2. **Encryption**: E2E encryption with ChaCha20-Poly1305; key derived from user password via Argon2, stored in OS keychain, encrypted copy on server for cross-device setup. |
| 131 |
3. **Push/pull**: Local changes are recorded by SQLite triggers into `sync_changelog`. Push sends unpushed entries to the server; pull applies remote changes with `applying_remote` flag set to suppress trigger re-recording. FK-safe ordering ensures parents are created before children and children are deleted before parents. |
| 132 |
4. **Scheduler**: Background tokio task runs periodic sync cycles (configurable interval, default 15 minutes). |
| 133 |
|
| 134 |
Synced tables: `vfs`, `samples`, `vfs_nodes`, `audio_analysis`, `tags`, `collections`, `collection_members`, `smart_folders`, `user_config`. Audio file blobs can optionally sync per-VFS (controlled by `vfs.sync_files` flag); metadata always syncs. |
| 135 |
|
| 136 |
## Import Pipeline |
| 137 |
|
| 138 |
Folder import runs in a background thread with its own database connection: |
| 139 |
|
| 140 |
1. **Walk**: Recursively scan the source directory, collecting paths with recognized audio extensions (wav, aiff, mp3, flac, ogg). |
| 141 |
2. **Strategy selection**: User chooses flat (all links in current directory), new VFS (preserve directory structure), or merge into existing VFS. |
| 142 |
3. **Import loop**: For each file -- hash, copy to store, create VFS node. Duplicates (name conflicts) are counted but not errors. Progress and errors are reported back to the GUI via channel events. |
| 143 |
4. **Folder tagging**: After import completes, the user can assign comma-separated tags to each imported top-level folder, applied to all samples within. |
| 144 |
5. **Analysis**: Optionally run configurable analysis (loudness, BPM, key, spectral, classification, loop detection, fingerprint, auto-suggest tags) on imported samples. |
| 145 |
6. **Tag review**: If auto-suggest was enabled, the user reviews suggested tags with accept/reject per suggestion before committing. |
| 146 |
|
| 147 |
Cancellation is checked between files, keeping the UI responsive during large imports. |
| 148 |
|
| 149 |
## Key Design Decisions |
| 150 |
|
| 151 |
- **Content-addressed storage** prevents duplicate blobs regardless of how many VFS locations reference the same sample. SHA-256 provides collision resistance and integrity verification. |
| 152 |
- **VFS abstraction** decouples user-visible organization from on-disk storage. Users can create multiple independent directory trees, rename and restructure freely, without moving files. |
| 153 |
- **Backend trait** keeps the browser UI decoupled from the data layer, making it testable with mock backends. |
| 154 |
- **Synchronous core** keeps the data layer simple and predictable. Async is confined to the sync layer (tokio) and the app. |
| 155 |
- **try_lock on audio thread** guarantees real-time safety. The cpal callback never blocks -- it either gets the lock and produces audio, or outputs silence. |
| 156 |
- **Strongly-typed IDs** (VfsId, NodeId, SampleHash) prevent accidental mixups at compile time. Integer IDs use a macro-generated newtype; SampleHash wraps a hex string. |
| 157 |
- **Device plugin system** with Rhai scripting allows hardware sampler export profiles to be extended by users without recompiling, while keeping the sandbox constrained. |
| 158 |
- **Trigger-based sync changelog** captures all local mutations automatically without requiring callers to manually record changes, making sync integration transparent to the rest of the codebase. |
| 159 |
|