Skip to main content

max / audiofiles

12.1 KB · 285 lines History Blame Raw
1 # Contributing to audiofiles
2
3 Patterns, conventions, and rules for working on the audiofiles codebase.
4
5 ## Project Structure
6
7 ```
8 audiofiles/ (workspace root)
9 Cargo.toml # Workspace definition
10 crates/
11 audiofiles-core/ # Domain logic, storage, VFS, analysis, database
12 audiofiles-browser/ # eframe/egui GUI, Backend trait, state management
13 audiofiles-app/ # App entry point (thin shell)
14 audiofiles-sync/ # SyncKit cloud sync integration
15 audiofiles-rhai/ # Device plugin runtime (TOML manifests + Rhai hooks)
16 audiofiles-train/ # ML classifier training (dev-only binary)
17 audiofiles-bench/ # Performance benchmarks
18 audiofiles-rhai/plugins/bundled/ # Bundled device profiles (SP-404, MPC, etc.)
19 dist/ # Build scripts for macOS, Windows, Linux
20 ```
21
22 ### Crate Boundaries
23
24 | Crate | Role | Key constraint |
25 |-------|------|----------------|
26 | `audiofiles-core` | All domain logic | **Sync-only** (no async runtime) |
27 | `audiofiles-browser` | GUI + backend abstraction | Owns egui state, polls workers |
28 | `audiofiles-app` | Entry point | Thin shell, creates backend + launches GUI |
29 | `audiofiles-sync` | Cloud sync | Uses tokio + `spawn_blocking` for rusqlite |
30 | `audiofiles-rhai` | Device plugins | TOML manifests + sandboxed Rhai hooks |
31 | `audiofiles-train` | ML training | Dev-only, not shipped |
32 | `audiofiles-bench` | Benchmarks | criterion-based |
33
34 **Critical rule:** `audiofiles-core` is entirely synchronous. `rusqlite::Connection` is `!Send`, so all database operations are synchronous. Long-running operations (import, analysis, export) use dedicated worker threads with channel-based message passing.
35
36 ## Content-Addressed Storage
37
38 Samples are identified by SHA-256 hash — the hash IS the primary key. There are no UUIDs or auto-increment IDs for samples.
39
40 ```rust
41 pub struct SampleStore {
42 root: PathBuf,
43 }
44 ```
45
46 **Import flow:**
47 1. Stream file through SHA-256 hasher
48 2. Check if blob already exists at `samples/{hash}.{ext}` — if so, skip copy (dedup)
49 3. `INSERT OR IGNORE INTO samples` — dedup at DB level too
50 4. Return the hash
51
52 **Rules:**
53 - Never store samples by filename or path. The hash is the only identifier.
54 - The `cloud_only` flag marks samples whose local blobs have been evicted but still exist in cloud sync.
55 - Hash validation rejects anything that isn't exactly 64 lowercase hex characters.
56 - Extension validation rejects directory traversal attempts.
57
58 ## Worker Thread Pattern
59
60 Long-running operations use dedicated threads with channel-based message passing:
61
62 ```rust
63 pub struct WorkerHandle {
64 cmd_tx: mpsc::Sender<WorkerCommand>, // Send commands to worker
65 event_rx: Mutex<mpsc::Receiver<WorkerEvent>>, // Receive results
66 cancel_flag: Arc<AtomicBool>, // Lock-free cancellation
67 thread: Option<JoinHandle<()>>, // Join on drop
68 }
69 ```
70
71 **Command/Event protocol:**
72 - `WorkerCommand::AnalyzeBatch { samples, config }` → worker processes samples
73 - `WorkerEvent::Progress { completed, total, current_name }` → UI updates progress bar
74 - `WorkerEvent::SampleDone { result, suggestions }` → UI stores result
75 - `WorkerEvent::BatchComplete` → UI finishes operation
76
77 **Rules:**
78 - `Mutex<Receiver>` satisfies `Send + Sync` requirements for egui state.
79 - `Arc<AtomicBool>` for lock-free cancellation — worker checks before each sample.
80 - `Drop` implementation sends `Shutdown` command and joins the thread.
81 - The browser crate calls `try_recv()` each frame to poll for events.
82
83 ## Backend Trait
84
85 `Backend` is an async trait that abstracts all data operations:
86
87 ```rust
88 pub trait Backend: Send + Sync {
89 fn list_vfs(&self) -> BackendResult<Vec<Vfs>>;
90 fn import_file(&self, path: &Path) -> BackendResult<String>;
91 fn start_analysis(&self, samples: Vec<(String, String)>, config: AnalysisConfig) -> BackendResult<()>;
92 fn poll_events(&self) -> Vec<BackendEvent>;
93 // ... ~30 methods covering VFS, tags, search, analysis, export
94 }
95 ```
96
97 `DirectBackend` is the sole implementation, wrapping `Mutex<Database>` + `SampleStore`. This indirection exists to support potential future backends without changing UI code. UI code always calls `backend.method()`, never accesses the database directly.
98
99 ## VFS Abstraction
100
101 Users organize samples through virtual file systems (VFS), not by moving files on disk.
102
103 - `vfs_nodes` is a self-referential tree (`parent_id` FK to own table)
104 - Nodes are either `Directory` or `Sample` (with a `sample_hash` FK)
105 - A sample can appear in multiple VFS locations without duplication
106 - Move/rename operations only touch VFS metadata, not the blob store
107 - `UNIQUE(vfs_id, parent_id, name)` prevents duplicate names in the same directory
108
109 **Enriched queries** join VFS nodes with analysis data and sample metadata in a single query, avoiding N+1 patterns.
110
111 ## Analysis Pipeline
112
113 The pipeline runs in a worker thread and processes each sample through these stages:
114
115 1. **Decode** — Symphonia decodes any audio format to mono f32
116 2. **Loudness** — Peak dB, RMS dB, LUFS (fast, uses full signal)
117 3. **Spectral** — STFT → centroid, flatness, rolloff, bandwidth, ZCR, onset strength
118 4. **MFCC + ML** — Extract MFCCs from magnitude frames, run through embedded neural classifier
119 5. **BPM** — Tempo detection (skipped for non-rhythmic samples if `smart_skip` enabled)
120 6. **Key** — Musical key detection (skipped for non-pitched samples if `smart_skip` enabled)
121 7. **Loop** — Loop point detection
122 8. **Fingerprint** — Peak envelope for near-duplicate detection (VP-tree similarity search)
123
124 All results stored in `audio_analysis` table (one row per hash). The `smart_skip` feature uses ML classification to skip irrelevant stages (e.g., no BPM detection for ambient textures).
125
126 ### Adding a New Analysis Stage
127
128 1. Add the computation in `crates/audiofiles-core/src/analysis/`
129 2. Add column(s) to `audio_analysis` table via inline migration in `db.rs`
130 3. Wire into the pipeline in `analysis/mod.rs`
131 4. Add to `AnalysisResult` struct
132 5. Expose in the enriched VFS query if needed for the UI
133
134 ## Unsafe FFI
135
136 Platform-specific drag-and-drop requires FFI:
137 - **macOS:** `drag_out/macos.rs` — objc2 message sends, libdispatch async to main thread
138 - **Windows:** `drag_out/windows.rs` — COM/OLE `DoDragDrop`
139
140 **Rules:**
141 - Every `unsafe` block MUST have a `// SAFETY:` comment explaining the invariant.
142 - macOS FFI uses `MainThreadMarker` to guarantee AppKit calls happen on the main thread.
143 - `RcBlock` prevents use-after-free in async dispatch.
144 - The `DRAG_ACTIVE` atomic flag prevents concurrent drag sessions.
145
146 ## Error Handling
147
148 Three error types, one per major crate:
149
150 ```rust
151 // audiofiles-core
152 pub enum CoreError {
153 Db(rusqlite::Error),
154 Io { path: PathBuf, source: std::io::Error },
155 SampleNotFound(String),
156 VfsNotFound(VfsId),
157 Analysis(AnalysisError),
158 // ...
159 }
160
161 // audiofiles-sync
162 pub enum SyncError {
163 Db(rusqlite::Error),
164 Client(String),
165 Auth(String),
166 Io(std::io::Error),
167 }
168
169 // audiofiles-browser
170 pub enum BackendError {
171 Core(CoreError),
172 Other(String),
173 }
174 ```
175
176 Use typed variants with context. Use `?` for propagation. `From` impls enable automatic conversion between error types.
177
178 ## Database
179
180 ### Inline Migrations
181
182 Migrations are `const` strings in `crates/audiofiles-core/src/db.rs`, applied sequentially on database open:
183
184 ```rust
185 const MIGRATION_001: &str = r#"
186 CREATE TABLE samples (
187 hash TEXT PRIMARY KEY,
188 original_name TEXT NOT NULL,
189 ...
190 );
191 "#;
192 ```
193
194 Production uses a file-backed SQLite database. Tests use `:memory:`.
195
196 When adding a migration, make it **replay-safe**: every `CREATE TABLE / INDEX / TRIGGER` should be `IF NOT EXISTS` (or preceded by `DROP IF EXISTS` for triggers whose body changes), and any seed insert should be `INSERT OR IGNORE`. The `migration_replay_from_version_two_against_full_schema` test in `db.rs` rolls `user_version` back to 2 and re-runs every migration from M003 onward against a populated schema — non-idempotent CREATEs fail it. M001 (initial schema) and M002 (`DROP TABLE tags; ALTER tags_v2 RENAME TO tags`) are inherently one-shot and excluded from the replay test.
197
198 The connection registers a custom `hash_row_id(salt, key)` SQLite function on open (rusqlite `functions` feature). It's used by the M018 sync triggers; if you write a migration that creates new sync triggers, prefer it for any row_id that would otherwise leak user content.
199
200 ### Sync Changelog Triggers
201
202 Every synced table has triggers that insert into `sync_changelog` on INSERT/UPDATE/DELETE. A `sync_state` row (`applying_remote = '1'`) suppresses triggers during pull operations to prevent recursion.
203
204 Per migration M018 (2026-06-02), `sync_changelog.row_id` is hashed via `hash_row_id(row_id_salt, canonical_key)` for sensitive tables (samples, audio_analysis, tags, collection_members) so the server never sees raw sample hashes or tag strings. The salt is generated per device, stored in `sync_state`, never synced. DELETE triggers also emit the canonical PK in the encrypted `data` field, which `resolve::apply_delete` reads to reconstruct WHERE clauses without parsing the (now-opaque) row_id. When adding a new synced table, follow the same pattern: wrap row_id in `hash_row_id(...)` if it carries user content, and emit the canonical PK into `data` for DELETE.
205
206 ### rusqlite + async
207
208 `rusqlite::Connection` is `!Send`. In the sync crate (which uses tokio), all database operations go through `tokio::task::spawn_blocking`. In core (sync-only), no async runtime is needed.
209
210 ## SyncKit Integration
211
212 Cloud sync is optional. The `SyncManager` coordinates push/pull:
213
214 - **Tables synced** (in FK-safe order): `vfs`, `samples`, `collections`, `vfs_nodes`, `audio_analysis`, `tags`, `collection_members`, `user_config`, `edit_history` (smart_folders merged into `collections.filter_json` in M015 and the standalone table dropped)
215 - **Delete order** is reversed (children first)
216 - **Column whitelist:** `table_columns()` restricts which columns sync to prevent schema drift
217 - **Blob sync:** Sample files sync to cloud storage for VFS entries with `sync_files = true`
218 - **`cloud_only` flag:** Marks samples whose local blobs have been evicted
219
220 ## Device Plugins
221
222 TOML manifests in `plugins/devices/` define hardware constraints. Optional Rhai scripts in `hooks/` run sandboxed.
223
224 ### Hook Style
225
226 Optional Rhai hooks follow the cross-project Rhai style guide. Run `_meta/scripts/lint-rhai.sh` to check formatting. Key points: 4-space indent, `snake_case` functions, `UPPER_CASE` constants, header comment block.
227
228 ### Manifest Contract
229
230 ```toml
231 [device]
232 name = "SP-404 MKII"
233 manufacturer = "Roland"
234
235 [audio]
236 formats = ["wav"]
237 sample_rates = [44100, 48000]
238 bit_depths = [16, 24]
239 channels = "both"
240
241 [naming]
242 case = "upper"
243 max_length = 12
244
245 [hooks]
246 validate_sample = "hooks/validate.rhai"
247 transform_filename = "hooks/filename.rhai"
248 ```
249
250 ### Hook Functions
251
252 | Hook | Input | Returns | Purpose |
253 |------|-------|---------|---------|
254 | `validate_sample` | `info` (sample metadata) | `bool` | Accept/reject sample for device |
255 | `transform_filename` | `name`, `ctx` | `String` | Rename for device conventions |
256 | `pre_export` | `ctx` || Run before export batch |
257 | `post_export` | `ctx` || Run after export batch |
258
259 ## Concurrency
260
261 - `parking_lot::Mutex` everywhere (not `std::sync::Mutex`) — no poisoning, shorter lock API.
262 - `#[instrument(skip_all)]` on all significant functions.
263 - Worker threads for long-running operations (never block the UI thread).
264 - The egui render loop polls `backend.poll_events()` each frame for worker results.
265
266 ## Testing
267
268 - **Core tests:** In-file `#[cfg(test)]` modules with `test_helpers::insert_fake_sample` for fixtures
269 - **Sync tests:** Unit tests in sync crate modules
270 - **No GUI tests:** Immediate-mode UI is tested manually
271 - Test databases use in-memory SQLite (`:memory:`) with migrations applied via `Database::open`
272
273 ## Building and Distribution
274
275 Summary of platform-specific builds:
276
277 | Platform | Method |
278 |----------|--------|
279 | macOS arm64 | Native cargo, signed + notarized DMG |
280 | Windows x86_64 | `cargo-xwin` cross-compile, MSI + EXE |
281 | Linux aarch64 | Native cargo on Astra, AppImage + .deb |
282 | Linux x86_64 | Cross-compile on Astra, AppImage + .deb |
283
284 Every release must have all 7 artifacts before uploading.
285