# Contributing to audiofiles Patterns, conventions, and rules for working on the audiofiles codebase. ## Project Structure ``` audiofiles/ (workspace root) Cargo.toml # Workspace definition crates/ audiofiles-core/ # Domain logic, storage, VFS, analysis, database audiofiles-browser/ # eframe/egui GUI, Backend trait, state management audiofiles-app/ # App entry point (thin shell) audiofiles-sync/ # SyncKit cloud sync integration audiofiles-rhai/ # Device plugin runtime (TOML manifests + Rhai hooks) audiofiles-train/ # ML classifier training (dev-only binary) audiofiles-bench/ # Performance benchmarks audiofiles-rhai/plugins/bundled/ # Bundled device profiles (SP-404, MPC, etc.) dist/ # Build scripts for macOS, Windows, Linux ``` ### Crate Boundaries | Crate | Role | Key constraint | |-------|------|----------------| | `audiofiles-core` | All domain logic | **Sync-only** (no async runtime) | | `audiofiles-browser` | GUI + backend abstraction | Owns egui state, polls workers | | `audiofiles-app` | Entry point | Thin shell, creates backend + launches GUI | | `audiofiles-sync` | Cloud sync | Uses tokio + `spawn_blocking` for rusqlite | | `audiofiles-rhai` | Device plugins | TOML manifests + sandboxed Rhai hooks | | `audiofiles-train` | ML training | Dev-only, not shipped | | `audiofiles-bench` | Benchmarks | criterion-based | **Critical rule:** `audiofiles-core` is entirely synchronous. `rusqlite::Connection` is `!Send`, so all database operations are synchronous. Long-running operations (import, analysis, export) use dedicated worker threads with channel-based message passing. ## Content-Addressed Storage Samples are identified by SHA-256 hash — the hash IS the primary key. There are no UUIDs or auto-increment IDs for samples. ```rust pub struct SampleStore { root: PathBuf, } ``` **Import flow:** 1. Stream file through SHA-256 hasher 2. Check if blob already exists at `samples/{hash}.{ext}` — if so, skip copy (dedup) 3. `INSERT OR IGNORE INTO samples` — dedup at DB level too 4. Return the hash **Rules:** - Never store samples by filename or path. The hash is the only identifier. - The `cloud_only` flag marks samples whose local blobs have been evicted but still exist in cloud sync. - Hash validation rejects anything that isn't exactly 64 lowercase hex characters. - Extension validation rejects directory traversal attempts. ## Worker Thread Pattern Long-running operations use dedicated threads with channel-based message passing: ```rust pub struct WorkerHandle { cmd_tx: mpsc::Sender, // Send commands to worker event_rx: Mutex>, // Receive results cancel_flag: Arc, // Lock-free cancellation thread: Option>, // Join on drop } ``` **Command/Event protocol:** - `WorkerCommand::AnalyzeBatch { samples, config }` → worker processes samples - `WorkerEvent::Progress { completed, total, current_name }` → UI updates progress bar - `WorkerEvent::SampleDone { result, suggestions }` → UI stores result - `WorkerEvent::BatchComplete` → UI finishes operation **Rules:** - `Mutex` satisfies `Send + Sync` requirements for egui state. - `Arc` for lock-free cancellation — worker checks before each sample. - `Drop` implementation sends `Shutdown` command and joins the thread. - The browser crate calls `try_recv()` each frame to poll for events. ## Backend Trait `Backend` is an async trait that abstracts all data operations: ```rust pub trait Backend: Send + Sync { fn list_vfs(&self) -> BackendResult>; fn import_file(&self, path: &Path) -> BackendResult; fn start_analysis(&self, samples: Vec<(String, String)>, config: AnalysisConfig) -> BackendResult<()>; fn poll_events(&self) -> Vec; // ... ~30 methods covering VFS, tags, search, analysis, export } ``` `DirectBackend` is the sole implementation, wrapping `Mutex` + `SampleStore`. This indirection exists to support potential future backends without changing UI code. UI code always calls `backend.method()`, never accesses the database directly. ## VFS Abstraction Users organize samples through virtual file systems (VFS), not by moving files on disk. - `vfs_nodes` is a self-referential tree (`parent_id` FK to own table) - Nodes are either `Directory` or `Sample` (with a `sample_hash` FK) - A sample can appear in multiple VFS locations without duplication - Move/rename operations only touch VFS metadata, not the blob store - `UNIQUE(vfs_id, parent_id, name)` prevents duplicate names in the same directory **Enriched queries** join VFS nodes with analysis data and sample metadata in a single query, avoiding N+1 patterns. ## Analysis Pipeline The pipeline runs in a worker thread and processes each sample through these stages: 1. **Decode** — Symphonia decodes any audio format to mono f32 2. **Loudness** — Peak dB, RMS dB, LUFS (fast, uses full signal) 3. **Spectral** — STFT → centroid, flatness, rolloff, bandwidth, ZCR, onset strength 4. **MFCC + ML** — Extract MFCCs from magnitude frames, run through embedded neural classifier 5. **BPM** — Tempo detection (skipped for non-rhythmic samples if `smart_skip` enabled) 6. **Key** — Musical key detection (skipped for non-pitched samples if `smart_skip` enabled) 7. **Loop** — Loop point detection 8. **Fingerprint** — Peak envelope for near-duplicate detection (VP-tree similarity search) All results stored in `audio_analysis` table (one row per hash). The `smart_skip` feature uses ML classification to skip irrelevant stages (e.g., no BPM detection for ambient textures). ### Adding a New Analysis Stage 1. Add the computation in `crates/audiofiles-core/src/analysis/` 2. Add column(s) to `audio_analysis` table via inline migration in `db.rs` 3. Wire into the pipeline in `analysis/mod.rs` 4. Add to `AnalysisResult` struct 5. Expose in the enriched VFS query if needed for the UI ## Unsafe FFI Platform-specific drag-and-drop requires FFI: - **macOS:** `drag_out/macos.rs` — objc2 message sends, libdispatch async to main thread - **Windows:** `drag_out/windows.rs` — COM/OLE `DoDragDrop` **Rules:** - Every `unsafe` block MUST have a `// SAFETY:` comment explaining the invariant. - macOS FFI uses `MainThreadMarker` to guarantee AppKit calls happen on the main thread. - `RcBlock` prevents use-after-free in async dispatch. - The `DRAG_ACTIVE` atomic flag prevents concurrent drag sessions. ## Error Handling Three error types, one per major crate: ```rust // audiofiles-core pub enum CoreError { Db(rusqlite::Error), Io { path: PathBuf, source: std::io::Error }, SampleNotFound(String), VfsNotFound(VfsId), Analysis(AnalysisError), // ... } // audiofiles-sync pub enum SyncError { Db(rusqlite::Error), Client(String), Auth(String), Io(std::io::Error), } // audiofiles-browser pub enum BackendError { Core(CoreError), Other(String), } ``` Use typed variants with context. Use `?` for propagation. `From` impls enable automatic conversion between error types. ## Database ### Inline Migrations Migrations are `const` strings in `crates/audiofiles-core/src/db.rs`, applied sequentially on database open: ```rust const MIGRATION_001: &str = r#" CREATE TABLE samples ( hash TEXT PRIMARY KEY, original_name TEXT NOT NULL, ... ); "#; ``` Production uses a file-backed SQLite database. Tests use `:memory:`. When adding a migration, make it **replay-safe**: every `CREATE TABLE / INDEX / TRIGGER` should be `IF NOT EXISTS` (or preceded by `DROP IF EXISTS` for triggers whose body changes), and any seed insert should be `INSERT OR IGNORE`. The `migration_replay_from_version_two_against_full_schema` test in `db.rs` rolls `user_version` back to 2 and re-runs every migration from M003 onward against a populated schema — non-idempotent CREATEs fail it. M001 (initial schema) and M002 (`DROP TABLE tags; ALTER tags_v2 RENAME TO tags`) are inherently one-shot and excluded from the replay test. The connection registers a custom `hash_row_id(salt, key)` SQLite function on open (rusqlite `functions` feature). It's used by the M018 sync triggers; if you write a migration that creates new sync triggers, prefer it for any row_id that would otherwise leak user content. ### Sync Changelog Triggers Every synced table has triggers that insert into `sync_changelog` on INSERT/UPDATE/DELETE. A `sync_state` row (`applying_remote = '1'`) suppresses triggers during pull operations to prevent recursion. Per migration M018 (2026-06-02), `sync_changelog.row_id` is hashed via `hash_row_id(row_id_salt, canonical_key)` for sensitive tables (samples, audio_analysis, tags, collection_members) so the server never sees raw sample hashes or tag strings. The salt is generated per device, stored in `sync_state`, never synced. DELETE triggers also emit the canonical PK in the encrypted `data` field, which `resolve::apply_delete` reads to reconstruct WHERE clauses without parsing the (now-opaque) row_id. When adding a new synced table, follow the same pattern: wrap row_id in `hash_row_id(...)` if it carries user content, and emit the canonical PK into `data` for DELETE. ### rusqlite + async `rusqlite::Connection` is `!Send`. In the sync crate (which uses tokio), all database operations go through `tokio::task::spawn_blocking`. In core (sync-only), no async runtime is needed. ## SyncKit Integration Cloud sync is optional. The `SyncManager` coordinates push/pull: - **Tables synced** (in FK-safe order): `vfs`, `samples`, `collections`, `vfs_nodes`, `audio_analysis`, `tags`, `collection_members`, `user_config`, `edit_history` (smart_folders merged into `collections.filter_json` in M015 and the standalone table dropped) - **Delete order** is reversed (children first) - **Column whitelist:** `table_columns()` restricts which columns sync to prevent schema drift - **Blob sync:** Sample files sync to cloud storage for VFS entries with `sync_files = true` - **`cloud_only` flag:** Marks samples whose local blobs have been evicted ## Device Plugins TOML manifests in `plugins/devices/` define hardware constraints. Optional Rhai scripts in `hooks/` run sandboxed. ### Hook Style Optional Rhai hooks follow the cross-project Rhai style guide. Run `_meta/scripts/lint-rhai.sh` to check formatting. Key points: 4-space indent, `snake_case` functions, `UPPER_CASE` constants, header comment block. ### Manifest Contract ```toml [device] name = "SP-404 MKII" manufacturer = "Roland" [audio] formats = ["wav"] sample_rates = [44100, 48000] bit_depths = [16, 24] channels = "both" [naming] case = "upper" max_length = 12 [hooks] validate_sample = "hooks/validate.rhai" transform_filename = "hooks/filename.rhai" ``` ### Hook Functions | Hook | Input | Returns | Purpose | |------|-------|---------|---------| | `validate_sample` | `info` (sample metadata) | `bool` | Accept/reject sample for device | | `transform_filename` | `name`, `ctx` | `String` | Rename for device conventions | | `pre_export` | `ctx` | — | Run before export batch | | `post_export` | `ctx` | — | Run after export batch | ## Concurrency - `parking_lot::Mutex` everywhere (not `std::sync::Mutex`) — no poisoning, shorter lock API. - `#[instrument(skip_all)]` on all significant functions. - Worker threads for long-running operations (never block the UI thread). - The egui render loop polls `backend.poll_events()` each frame for worker results. ## Testing - **Core tests:** In-file `#[cfg(test)]` modules with `test_helpers::insert_fake_sample` for fixtures - **Sync tests:** Unit tests in sync crate modules - **No GUI tests:** Immediate-mode UI is tested manually - Test databases use in-memory SQLite (`:memory:`) with migrations applied via `Database::open` ## Building and Distribution Summary of platform-specific builds: | Platform | Method | |----------|--------| | macOS arm64 | Native cargo, signed + notarized DMG | | Windows x86_64 | `cargo-xwin` cross-compile, MSI + EXE | | Linux aarch64 | Native cargo on Astra, AppImage + .deb | | Linux x86_64 | Cross-compile on Astra, AppImage + .deb | Every release must have all 7 artifacts before uploading.