Skip to main content

max / audiofiles

Add CONTRIBUTING.md Extract coding patterns (content-addressed storage, worker threads, Backend trait, VFS abstraction, analysis pipeline, unsafe FFI rules) from CLAUDE.md into a human-readable contributor guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author: Max J. <87768334+MaxJMath@users.noreply.github.com> · 2026-04-15 22:44 UTC
Commit: af6c246f313c315de137c20796bb4fcea1870814
Parent: c246523
1 file changed, +274 insertions, -0 deletions
@@ -0,0 +1,274 @@
1 + # Contributing to audiofiles
2 +
3 + Patterns, conventions, and rules for working on the audiofiles codebase.
4 +
5 + ## Project Structure
6 +
7 + ```
8 + audiofiles/ (workspace root)
9 + Cargo.toml # Workspace definition
10 + crates/
11 + audiofiles-core/ # Domain logic, storage, VFS, analysis, database
12 + audiofiles-browser/ # eframe/egui GUI, Backend trait, state management
13 + audiofiles-app/ # App entry point (thin shell)
14 + audiofiles-sync/ # SyncKit cloud sync integration
15 + audiofiles-rhai/ # Device plugin runtime (TOML manifests + Rhai hooks)
16 + audiofiles-train/ # ML classifier training (dev-only binary)
17 + audiofiles-bench/ # Performance benchmarks
18 + plugins/devices/ # Bundled device profiles (SP-404, MPC, etc.)
19 + dist/ # Build scripts for macOS, Windows, Linux
20 + ```
21 +
22 + ### Crate Boundaries
23 +
24 + | Crate | Role | Key constraint |
25 + |-------|------|----------------|
26 + | `audiofiles-core` | All domain logic | **Sync-only** (no async runtime) |
27 + | `audiofiles-browser` | GUI + backend abstraction | Owns egui state, polls workers |
28 + | `audiofiles-app` | Entry point | Thin shell, creates backend + launches GUI |
29 + | `audiofiles-sync` | Cloud sync | Uses tokio + `spawn_blocking` for rusqlite |
30 + | `audiofiles-rhai` | Device plugins | TOML manifests + sandboxed Rhai hooks |
31 + | `audiofiles-train` | ML training | Dev-only, not shipped |
32 + | `audiofiles-bench` | Benchmarks | criterion-based |
33 +
34 + **Critical rule:** `audiofiles-core` is entirely synchronous. `rusqlite::Connection` is `!Send`, so all database operations are synchronous. Long-running operations (import, analysis, export) use dedicated worker threads with channel-based message passing.
35 +
36 + ## Content-Addressed Storage
37 +
38 + Samples are identified by SHA-256 hash — the hash IS the primary key. There are no UUIDs or auto-increment IDs for samples.
39 +
40 + ```rust
41 + pub struct SampleStore {
42 + root: PathBuf,
43 + }
44 + ```
45 +
46 + **Import flow:**
47 + 1. Stream file through SHA-256 hasher
48 + 2. Check if blob already exists at `samples/{hash}.{ext}` — if so, skip copy (dedup)
49 + 3. `INSERT OR IGNORE INTO samples` — dedup at DB level too
50 + 4. Return the hash
51 +
52 + **Rules:**
53 + - Never store samples by filename or path. The hash is the only identifier.
54 + - The `cloud_only` flag marks samples whose local blobs have been evicted but still exist in cloud sync.
55 + - Hash validation rejects anything that isn't exactly 64 lowercase hex characters.
56 + - Extension validation rejects directory traversal attempts.
57 +
58 + ## Worker Thread Pattern
59 +
60 + Long-running operations use dedicated threads with channel-based message passing:
61 +
62 + ```rust
63 + pub struct WorkerHandle {
64 + cmd_tx: mpsc::Sender<WorkerCommand>, // Send commands to worker
65 + event_rx: Mutex<mpsc::Receiver<WorkerEvent>>, // Receive results
66 + cancel_flag: Arc<AtomicBool>, // Lock-free cancellation
67 + thread: Option<JoinHandle<()>>, // Join on drop
68 + }
69 + ```
70 +
71 + **Command/Event protocol:**
72 + - `WorkerCommand::AnalyzeBatch { samples, config }` → worker processes samples
73 + - `WorkerEvent::Progress { completed, total, current_name }` → UI updates progress bar
74 + - `WorkerEvent::SampleDone { result, suggestions }` → UI stores result
75 + - `WorkerEvent::BatchComplete` → UI finishes operation
76 +
77 + **Rules:**
78 + - `Mutex<Receiver>` satisfies `Send + Sync` requirements for egui state.
79 + - `Arc<AtomicBool>` for lock-free cancellation — worker checks before each sample.
80 + - `Drop` implementation sends `Shutdown` command and joins the thread.
81 + - The browser crate calls `try_recv()` each frame to poll for events.
82 +
83 + ## Backend Trait
84 +
85 + `Backend` is an async trait that abstracts all data operations:
86 +
87 + ```rust
88 + pub trait Backend: Send + Sync {
89 + fn list_vfs(&self) -> BackendResult<Vec<Vfs>>;
90 + fn import_file(&self, path: &Path) -> BackendResult<String>;
91 + fn start_analysis(&self, samples: Vec<(String, String)>, config: AnalysisConfig) -> BackendResult<()>;
92 + fn poll_events(&self) -> Vec<BackendEvent>;
93 + // ... ~30 methods covering VFS, tags, search, analysis, export
94 + }
95 + ```
96 +
97 + `DirectBackend` is the sole implementation, wrapping `Mutex<Database>` + `SampleStore`. This indirection exists to support potential future backends without changing UI code. UI code always calls `backend.method()`, never accesses the database directly.
98 +
99 + ## VFS Abstraction
100 +
101 + Users organize samples through virtual file systems (VFS), not by moving files on disk.
102 +
103 + - `vfs_nodes` is a self-referential tree (`parent_id` FK to own table)
104 + - Nodes are either `Directory` or `Sample` (with a `sample_hash` FK)
105 + - A sample can appear in multiple VFS locations without duplication
106 + - Move/rename operations only touch VFS metadata, not the blob store
107 + - `UNIQUE(vfs_id, parent_id, name)` prevents duplicate names in the same directory
108 +
109 + **Enriched queries** join VFS nodes with analysis data and sample metadata in a single query, avoiding N+1 patterns.
110 +
111 + ## Analysis Pipeline
112 +
113 + The pipeline runs in a worker thread and processes each sample through these stages:
114 +
115 + 1. **Decode** — Symphonia decodes any audio format to mono f32
116 + 2. **Loudness** — Peak dB, RMS dB, LUFS (fast, uses full signal)
117 + 3. **Spectral** — STFT → centroid, flatness, rolloff, bandwidth, ZCR, onset strength
118 + 4. **MFCC + ML** — Extract MFCCs from magnitude frames, run through embedded neural classifier
119 + 5. **BPM** — Tempo detection (skipped for non-rhythmic samples if `smart_skip` enabled)
120 + 6. **Key** — Musical key detection (skipped for non-pitched samples if `smart_skip` enabled)
121 + 7. **Loop** — Loop point detection
122 + 8. **Fingerprint** — Peak envelope for near-duplicate detection (VP-tree similarity search)
123 +
124 + All results stored in `audio_analysis` table (one row per hash). The `smart_skip` feature uses ML classification to skip irrelevant stages (e.g., no BPM detection for ambient textures).
125 +
126 + ### Adding a New Analysis Stage
127 +
128 + 1. Add the computation in `crates/audiofiles-core/src/analysis/`
129 + 2. Add column(s) to `audio_analysis` table via inline migration in `db.rs`
130 + 3. Wire into the pipeline in `analysis/mod.rs`
131 + 4. Add to `AnalysisResult` struct
132 + 5. Expose in the enriched VFS query if needed for the UI
133 +
134 + ## Unsafe FFI
135 +
136 + Platform-specific drag-and-drop requires FFI:
137 + - **macOS:** `drag_out/macos.rs` — objc2 message sends, libdispatch async to main thread
138 + - **Windows:** `drag_out/windows.rs` — COM/OLE `DoDragDrop`
139 +
140 + **Rules:**
141 + - Every `unsafe` block MUST have a `// SAFETY:` comment explaining the invariant.
142 + - macOS FFI uses `MainThreadMarker` to guarantee AppKit calls happen on the main thread.
143 + - `RcBlock` prevents use-after-free in async dispatch.
144 + - The `DRAG_ACTIVE` atomic flag prevents concurrent drag sessions.
145 +
146 + ## Error Handling
147 +
148 + Three error types, one per major crate:
149 +
150 + ```rust
151 + // audiofiles-core
152 + pub enum CoreError {
153 + Db(rusqlite::Error),
154 + Io { path: PathBuf, source: std::io::Error },
155 + SampleNotFound(String),
156 + VfsNotFound(VfsId),
157 + Analysis(AnalysisError),
158 + // ...
159 + }
160 +
161 + // audiofiles-sync
162 + pub enum SyncError {
163 + Db(rusqlite::Error),
164 + Client(String),
165 + Auth(String),
166 + Io(std::io::Error),
167 + }
168 +
169 + // audiofiles-browser
170 + pub enum BackendError {
171 + Core(CoreError),
172 + Other(String),
173 + }
174 + ```
175 +
176 + Use typed variants with context. Use `?` for propagation. `From` impls enable automatic conversion between error types.
177 +
178 + ## Database
179 +
180 + ### Inline Migrations
181 +
182 + Migrations are `const` strings in `crates/audiofiles-core/src/db.rs`, applied sequentially on database open:
183 +
184 + ```rust
185 + const MIGRATION_001: &str = r#"
186 + CREATE TABLE samples (
187 + hash TEXT PRIMARY KEY,
188 + original_name TEXT NOT NULL,
189 + ...
190 + );
191 + "#;
192 + ```
193 +
194 + Production uses a file-backed SQLite database. Tests use `:memory:`.
195 +
196 + ### Sync Changelog Triggers
197 +
198 + Every synced table has triggers that insert into `sync_changelog` on INSERT/UPDATE/DELETE. A `sync_state` row (`applying_remote = '1'`) suppresses triggers during pull operations to prevent recursion.
199 +
200 + ### rusqlite + async
201 +
202 + `rusqlite::Connection` is `!Send`. In the sync crate (which uses tokio), all database operations go through `tokio::task::spawn_blocking`. In core (sync-only), no async runtime is needed.
203 +
204 + ## SyncKit Integration
205 +
206 + Cloud sync is optional. The `SyncManager` coordinates push/pull:
207 +
208 + - **Tables synced** (in FK-safe order): `vfs`, `samples`, `collections`, `vfs_nodes`, `audio_analysis`, `tags`, `collection_members`, `smart_folders`
209 + - **Delete order** is reversed (children first)
210 + - **Column whitelist:** `table_columns()` restricts which columns sync to prevent schema drift
211 + - **Blob sync:** Sample files sync to cloud storage for VFS entries with `sync_files = true`
212 + - **`cloud_only` flag:** Marks samples whose local blobs have been evicted
213 +
214 + ## Device Plugins
215 +
216 + TOML manifests in `plugins/devices/` define hardware constraints. Optional Rhai scripts in `hooks/` run sandboxed.
217 +
218 + ### Manifest Contract
219 +
220 + ```toml
221 + [device]
222 + name = "SP-404 MKII"
223 + manufacturer = "Roland"
224 +
225 + [audio]
226 + formats = ["wav"]
227 + sample_rates = [44100, 48000]
228 + bit_depths = [16, 24]
229 + channels = "both"
230 +
231 + [naming]
232 + case = "upper"
233 + max_length = 12
234 +
235 + [hooks]
236 + validate_sample = "hooks/validate.rhai"
237 + transform_filename = "hooks/filename.rhai"
238 + ```
239 +
240 + ### Hook Functions
241 +
242 + | Hook | Input | Returns | Purpose |
243 + |------|-------|---------|---------|
244 + | `validate_sample` | `info` (sample metadata) | `bool` | Accept/reject sample for device |
245 + | `transform_filename` | `name`, `ctx` | `String` | Rename for device conventions |
246 + | `pre_export` | `ctx` | — | Run before export batch |
247 + | `post_export` | `ctx` | — | Run after export batch |
248 +
249 + ## Concurrency
250 +
251 + - `parking_lot::Mutex` everywhere (not `std::sync::Mutex`) — no poisoning, shorter lock API.
252 + - `#[instrument(skip_all)]` on all significant functions.
253 + - Worker threads for long-running operations (never block the UI thread).
254 + - The egui render loop polls `backend.poll_events()` each frame for worker results.
255 +
256 + ## Testing
257 +
258 + - **Core tests:** In-file `#[cfg(test)]` modules with `test_helpers::insert_fake_sample` for fixtures
259 + - **Sync tests:** Unit tests in sync crate modules
260 + - **No GUI tests:** Immediate-mode UI is tested manually
261 + - Test databases use in-memory SQLite (`:memory:`) with migrations applied via `Database::open`
262 +
263 + ## Building and Distribution
264 +
265 + See `docs/distribution.md` for platform-specific build instructions. Summary:
266 +
267 + | Platform | Method |
268 + |----------|--------|
269 + | macOS arm64 | Native cargo, signed + notarized DMG |
270 + | Windows x86_64 | `cargo-xwin` cross-compile, MSI + EXE |
271 + | Linux aarch64 | Native cargo on Astra, AppImage + .deb |
272 + | Linux x86_64 | Cross-compile on Astra, AppImage + .deb |
273 +
274 + Every release must have all 7 artifacts before uploading.