max / audiofiles
1 file changed,
+274 insertions,
-0 deletions
| @@ -0,0 +1,274 @@ | |||
| 1 | + | # Contributing to audiofiles | |
| 2 | + | ||
| 3 | + | Patterns, conventions, and rules for working on the audiofiles codebase. | |
| 4 | + | ||
| 5 | + | ## Project Structure | |
| 6 | + | ||
| 7 | + | ``` | |
| 8 | + | audiofiles/ (workspace root) | |
| 9 | + | Cargo.toml # Workspace definition | |
| 10 | + | crates/ | |
| 11 | + | audiofiles-core/ # Domain logic, storage, VFS, analysis, database | |
| 12 | + | audiofiles-browser/ # eframe/egui GUI, Backend trait, state management | |
| 13 | + | audiofiles-app/ # App entry point (thin shell) | |
| 14 | + | audiofiles-sync/ # SyncKit cloud sync integration | |
| 15 | + | audiofiles-rhai/ # Device plugin runtime (TOML manifests + Rhai hooks) | |
| 16 | + | audiofiles-train/ # ML classifier training (dev-only binary) | |
| 17 | + | audiofiles-bench/ # Performance benchmarks | |
| 18 | + | plugins/devices/ # Bundled device profiles (SP-404, MPC, etc.) | |
| 19 | + | dist/ # Build scripts for macOS, Windows, Linux | |
| 20 | + | ``` | |
| 21 | + | ||
| 22 | + | ### Crate Boundaries | |
| 23 | + | ||
| 24 | + | | Crate | Role | Key constraint | | |
| 25 | + | |-------|------|----------------| | |
| 26 | + | | `audiofiles-core` | All domain logic | **Sync-only** (no async runtime) | | |
| 27 | + | | `audiofiles-browser` | GUI + backend abstraction | Owns egui state, polls workers | | |
| 28 | + | | `audiofiles-app` | Entry point | Thin shell, creates backend + launches GUI | | |
| 29 | + | | `audiofiles-sync` | Cloud sync | Uses tokio + `spawn_blocking` for rusqlite | | |
| 30 | + | | `audiofiles-rhai` | Device plugins | TOML manifests + sandboxed Rhai hooks | | |
| 31 | + | | `audiofiles-train` | ML training | Dev-only, not shipped | | |
| 32 | + | | `audiofiles-bench` | Benchmarks | criterion-based | | |
| 33 | + | ||
| 34 | + | **Critical rule:** `audiofiles-core` is entirely synchronous. `rusqlite::Connection` is `!Send`, so all database operations are synchronous. Long-running operations (import, analysis, export) use dedicated worker threads with channel-based message passing. | |
| 35 | + | ||
| 36 | + | ## Content-Addressed Storage | |
| 37 | + | ||
| 38 | + | Samples are identified by SHA-256 hash — the hash IS the primary key. There are no UUIDs or auto-increment IDs for samples. | |
| 39 | + | ||
| 40 | + | ```rust | |
| 41 | + | pub struct SampleStore { | |
| 42 | + | root: PathBuf, | |
| 43 | + | } | |
| 44 | + | ``` | |
| 45 | + | ||
| 46 | + | **Import flow:** | |
| 47 | + | 1. Stream file through SHA-256 hasher | |
| 48 | + | 2. Check if blob already exists at `samples/{hash}.{ext}` — if so, skip copy (dedup) | |
| 49 | + | 3. `INSERT OR IGNORE INTO samples` — dedup at DB level too | |
| 50 | + | 4. Return the hash | |
| 51 | + | ||
| 52 | + | **Rules:** | |
| 53 | + | - Never store samples by filename or path. The hash is the only identifier. | |
| 54 | + | - The `cloud_only` flag marks samples whose local blobs have been evicted but still exist in cloud sync. | |
| 55 | + | - Hash validation rejects anything that isn't exactly 64 lowercase hex characters. | |
| 56 | + | - Extension validation rejects directory traversal attempts. | |
| 57 | + | ||
| 58 | + | ## Worker Thread Pattern | |
| 59 | + | ||
| 60 | + | Long-running operations use dedicated threads with channel-based message passing: | |
| 61 | + | ||
| 62 | + | ```rust | |
| 63 | + | pub struct WorkerHandle { | |
| 64 | + | cmd_tx: mpsc::Sender<WorkerCommand>, // Send commands to worker | |
| 65 | + | event_rx: Mutex<mpsc::Receiver<WorkerEvent>>, // Receive results | |
| 66 | + | cancel_flag: Arc<AtomicBool>, // Lock-free cancellation | |
| 67 | + | thread: Option<JoinHandle<()>>, // Join on drop | |
| 68 | + | } | |
| 69 | + | ``` | |
| 70 | + | ||
| 71 | + | **Command/Event protocol:** | |
| 72 | + | - `WorkerCommand::AnalyzeBatch { samples, config }` → worker processes samples | |
| 73 | + | - `WorkerEvent::Progress { completed, total, current_name }` → UI updates progress bar | |
| 74 | + | - `WorkerEvent::SampleDone { result, suggestions }` → UI stores result | |
| 75 | + | - `WorkerEvent::BatchComplete` → UI finishes operation | |
| 76 | + | ||
| 77 | + | **Rules:** | |
| 78 | + | - `Mutex<Receiver>` satisfies `Send + Sync` requirements for egui state. | |
| 79 | + | - `Arc<AtomicBool>` for lock-free cancellation — worker checks before each sample. | |
| 80 | + | - `Drop` implementation sends `Shutdown` command and joins the thread. | |
| 81 | + | - The browser crate calls `try_recv()` each frame to poll for events. | |
| 82 | + | ||
| 83 | + | ## Backend Trait | |
| 84 | + | ||
| 85 | + | `Backend` is an async trait that abstracts all data operations: | |
| 86 | + | ||
| 87 | + | ```rust | |
| 88 | + | pub trait Backend: Send + Sync { | |
| 89 | + | fn list_vfs(&self) -> BackendResult<Vec<Vfs>>; | |
| 90 | + | fn import_file(&self, path: &Path) -> BackendResult<String>; | |
| 91 | + | fn start_analysis(&self, samples: Vec<(String, String)>, config: AnalysisConfig) -> BackendResult<()>; | |
| 92 | + | fn poll_events(&self) -> Vec<BackendEvent>; | |
| 93 | + | // ... ~30 methods covering VFS, tags, search, analysis, export | |
| 94 | + | } | |
| 95 | + | ``` | |
| 96 | + | ||
| 97 | + | `DirectBackend` is the sole implementation, wrapping `Mutex<Database>` + `SampleStore`. This indirection exists to support potential future backends without changing UI code. UI code always calls `backend.method()`, never accesses the database directly. | |
| 98 | + | ||
| 99 | + | ## VFS Abstraction | |
| 100 | + | ||
| 101 | + | Users organize samples through virtual file systems (VFS), not by moving files on disk. | |
| 102 | + | ||
| 103 | + | - `vfs_nodes` is a self-referential tree (`parent_id` FK to own table) | |
| 104 | + | - Nodes are either `Directory` or `Sample` (with a `sample_hash` FK) | |
| 105 | + | - A sample can appear in multiple VFS locations without duplication | |
| 106 | + | - Move/rename operations only touch VFS metadata, not the blob store | |
| 107 | + | - `UNIQUE(vfs_id, parent_id, name)` prevents duplicate names in the same directory | |
| 108 | + | ||
| 109 | + | **Enriched queries** join VFS nodes with analysis data and sample metadata in a single query, avoiding N+1 patterns. | |
| 110 | + | ||
| 111 | + | ## Analysis Pipeline | |
| 112 | + | ||
| 113 | + | The pipeline runs in a worker thread and processes each sample through these stages: | |
| 114 | + | ||
| 115 | + | 1. **Decode** — Symphonia decodes any audio format to mono f32 | |
| 116 | + | 2. **Loudness** — Peak dB, RMS dB, LUFS (fast, uses full signal) | |
| 117 | + | 3. **Spectral** — STFT → centroid, flatness, rolloff, bandwidth, ZCR, onset strength | |
| 118 | + | 4. **MFCC + ML** — Extract MFCCs from magnitude frames, run through embedded neural classifier | |
| 119 | + | 5. **BPM** — Tempo detection (skipped for non-rhythmic samples if `smart_skip` enabled) | |
| 120 | + | 6. **Key** — Musical key detection (skipped for non-pitched samples if `smart_skip` enabled) | |
| 121 | + | 7. **Loop** — Loop point detection | |
| 122 | + | 8. **Fingerprint** — Peak envelope for near-duplicate detection (VP-tree similarity search) | |
| 123 | + | ||
| 124 | + | All results stored in `audio_analysis` table (one row per hash). The `smart_skip` feature uses ML classification to skip irrelevant stages (e.g., no BPM detection for ambient textures). | |
| 125 | + | ||
| 126 | + | ### Adding a New Analysis Stage | |
| 127 | + | ||
| 128 | + | 1. Add the computation in `crates/audiofiles-core/src/analysis/` | |
| 129 | + | 2. Add column(s) to `audio_analysis` table via inline migration in `db.rs` | |
| 130 | + | 3. Wire into the pipeline in `analysis/mod.rs` | |
| 131 | + | 4. Add to `AnalysisResult` struct | |
| 132 | + | 5. Expose in the enriched VFS query if needed for the UI | |
| 133 | + | ||
| 134 | + | ## Unsafe FFI | |
| 135 | + | ||
| 136 | + | Platform-specific drag-and-drop requires FFI: | |
| 137 | + | - **macOS:** `drag_out/macos.rs` — objc2 message sends, libdispatch async to main thread | |
| 138 | + | - **Windows:** `drag_out/windows.rs` — COM/OLE `DoDragDrop` | |
| 139 | + | ||
| 140 | + | **Rules:** | |
| 141 | + | - Every `unsafe` block MUST have a `// SAFETY:` comment explaining the invariant. | |
| 142 | + | - macOS FFI uses `MainThreadMarker` to guarantee AppKit calls happen on the main thread. | |
| 143 | + | - `RcBlock` prevents use-after-free in async dispatch. | |
| 144 | + | - The `DRAG_ACTIVE` atomic flag prevents concurrent drag sessions. | |
| 145 | + | ||
| 146 | + | ## Error Handling | |
| 147 | + | ||
| 148 | + | Three error types, one per major crate: | |
| 149 | + | ||
| 150 | + | ```rust | |
| 151 | + | // audiofiles-core | |
| 152 | + | pub enum CoreError { | |
| 153 | + | Db(rusqlite::Error), | |
| 154 | + | Io { path: PathBuf, source: std::io::Error }, | |
| 155 | + | SampleNotFound(String), | |
| 156 | + | VfsNotFound(VfsId), | |
| 157 | + | Analysis(AnalysisError), | |
| 158 | + | // ... | |
| 159 | + | } | |
| 160 | + | ||
| 161 | + | // audiofiles-sync | |
| 162 | + | pub enum SyncError { | |
| 163 | + | Db(rusqlite::Error), | |
| 164 | + | Client(String), | |
| 165 | + | Auth(String), | |
| 166 | + | Io(std::io::Error), | |
| 167 | + | } | |
| 168 | + | ||
| 169 | + | // audiofiles-browser | |
| 170 | + | pub enum BackendError { | |
| 171 | + | Core(CoreError), | |
| 172 | + | Other(String), | |
| 173 | + | } | |
| 174 | + | ``` | |
| 175 | + | ||
| 176 | + | Use typed variants with context. Use `?` for propagation. `From` impls enable automatic conversion between error types. | |
| 177 | + | ||
| 178 | + | ## Database | |
| 179 | + | ||
| 180 | + | ### Inline Migrations | |
| 181 | + | ||
| 182 | + | Migrations are `const` strings in `crates/audiofiles-core/src/db.rs`, applied sequentially on database open: | |
| 183 | + | ||
| 184 | + | ```rust | |
| 185 | + | const MIGRATION_001: &str = r#" | |
| 186 | + | CREATE TABLE samples ( | |
| 187 | + | hash TEXT PRIMARY KEY, | |
| 188 | + | original_name TEXT NOT NULL, | |
| 189 | + | ... | |
| 190 | + | ); | |
| 191 | + | "#; | |
| 192 | + | ``` | |
| 193 | + | ||
| 194 | + | Production uses a file-backed SQLite database. Tests use `:memory:`. | |
| 195 | + | ||
| 196 | + | ### Sync Changelog Triggers | |
| 197 | + | ||
| 198 | + | Every synced table has triggers that insert into `sync_changelog` on INSERT/UPDATE/DELETE. A `sync_state` row (`applying_remote = '1'`) suppresses triggers during pull operations to prevent recursion. | |
| 199 | + | ||
| 200 | + | ### rusqlite + async | |
| 201 | + | ||
| 202 | + | `rusqlite::Connection` is `!Send`. In the sync crate (which uses tokio), all database operations go through `tokio::task::spawn_blocking`. In core (sync-only), no async runtime is needed. | |
| 203 | + | ||
| 204 | + | ## SyncKit Integration | |
| 205 | + | ||
| 206 | + | Cloud sync is optional. The `SyncManager` coordinates push/pull: | |
| 207 | + | ||
| 208 | + | - **Tables synced** (in FK-safe order): `vfs`, `samples`, `collections`, `vfs_nodes`, `audio_analysis`, `tags`, `collection_members`, `smart_folders` | |
| 209 | + | - **Delete order** is reversed (children first) | |
| 210 | + | - **Column whitelist:** `table_columns()` restricts which columns sync to prevent schema drift | |
| 211 | + | - **Blob sync:** Sample files sync to cloud storage for VFS entries with `sync_files = true` | |
| 212 | + | - **`cloud_only` flag:** Marks samples whose local blobs have been evicted | |
| 213 | + | ||
| 214 | + | ## Device Plugins | |
| 215 | + | ||
| 216 | + | TOML manifests in `plugins/devices/` define hardware constraints. Optional Rhai scripts in `hooks/` run sandboxed. | |
| 217 | + | ||
| 218 | + | ### Manifest Contract | |
| 219 | + | ||
| 220 | + | ```toml | |
| 221 | + | [device] | |
| 222 | + | name = "SP-404 MKII" | |
| 223 | + | manufacturer = "Roland" | |
| 224 | + | ||
| 225 | + | [audio] | |
| 226 | + | formats = ["wav"] | |
| 227 | + | sample_rates = [44100, 48000] | |
| 228 | + | bit_depths = [16, 24] | |
| 229 | + | channels = "both" | |
| 230 | + | ||
| 231 | + | [naming] | |
| 232 | + | case = "upper" | |
| 233 | + | max_length = 12 | |
| 234 | + | ||
| 235 | + | [hooks] | |
| 236 | + | validate_sample = "hooks/validate.rhai" | |
| 237 | + | transform_filename = "hooks/filename.rhai" | |
| 238 | + | ``` | |
| 239 | + | ||
| 240 | + | ### Hook Functions | |
| 241 | + | ||
| 242 | + | | Hook | Input | Returns | Purpose | | |
| 243 | + | |------|-------|---------|---------| | |
| 244 | + | | `validate_sample` | `info` (sample metadata) | `bool` | Accept/reject sample for device | | |
| 245 | + | | `transform_filename` | `name`, `ctx` | `String` | Rename for device conventions | | |
| 246 | + | | `pre_export` | `ctx` | — | Run before export batch | | |
| 247 | + | | `post_export` | `ctx` | — | Run after export batch | | |
| 248 | + | ||
| 249 | + | ## Concurrency | |
| 250 | + | ||
| 251 | + | - `parking_lot::Mutex` everywhere (not `std::sync::Mutex`) — no poisoning, shorter lock API. | |
| 252 | + | - `#[instrument(skip_all)]` on all significant functions. | |
| 253 | + | - Worker threads for long-running operations (never block the UI thread). | |
| 254 | + | - The egui render loop polls `backend.poll_events()` each frame for worker results. | |
| 255 | + | ||
| 256 | + | ## Testing | |
| 257 | + | ||
| 258 | + | - **Core tests:** In-file `#[cfg(test)]` modules with `test_helpers::insert_fake_sample` for fixtures | |
| 259 | + | - **Sync tests:** Unit tests in sync crate modules | |
| 260 | + | - **No GUI tests:** Immediate-mode UI is tested manually | |
| 261 | + | - Test databases use in-memory SQLite (`:memory:`) with migrations applied via `Database::open` | |
| 262 | + | ||
| 263 | + | ## Building and Distribution | |
| 264 | + | ||
| 265 | + | See `docs/distribution.md` for platform-specific build instructions. Summary: | |
| 266 | + | ||
| 267 | + | | Platform | Method | | |
| 268 | + | |----------|--------| | |
| 269 | + | | macOS arm64 | Native cargo, signed + notarized DMG | | |
| 270 | + | | Windows x86_64 | `cargo-xwin` cross-compile, MSI + EXE | | |
| 271 | + | | Linux aarch64 | Native cargo on Astra, AppImage + .deb | | |
| 272 | + | | Linux x86_64 | Cross-compile on Astra, AppImage + .deb | | |
| 273 | + | ||
| 274 | + | Every release must have all 7 artifacts before uploading. |