| 1 |
# Contributing to audiofiles |
| 2 |
|
| 3 |
Patterns, conventions, and rules for working on the audiofiles codebase. |
| 4 |
|
| 5 |
## Project Structure |
| 6 |
|
| 7 |
``` |
| 8 |
audiofiles/ (workspace root) |
| 9 |
Cargo.toml # Workspace definition |
| 10 |
crates/ |
| 11 |
audiofiles-core/ # Domain logic, storage, VFS, analysis, database |
| 12 |
audiofiles-browser/ # eframe/egui GUI, Backend trait, state management |
| 13 |
audiofiles-app/ # App entry point (thin shell) |
| 14 |
audiofiles-sync/ # SyncKit cloud sync integration |
| 15 |
audiofiles-rhai/ # Device plugin runtime (TOML manifests + Rhai hooks) |
| 16 |
audiofiles-train/ # ML classifier training (dev-only binary) |
| 17 |
audiofiles-bench/ # Performance benchmarks |
| 18 |
audiofiles-rhai/plugins/bundled/ # Bundled device profiles (SP-404, MPC, etc.) |
| 19 |
dist/ # Build scripts for macOS, Windows, Linux |
| 20 |
``` |
| 21 |
|
| 22 |
### Crate Boundaries |
| 23 |
|
| 24 |
|
| 25 |
|
| 26 |
| `audiofiles-core` | All domain logic | **Sync-only** (no async runtime) | |
| 27 |
| `audiofiles-browser` | GUI + backend abstraction | Owns egui state, polls workers | |
| 28 |
| `audiofiles-app` | Entry point | Thin shell, creates backend + launches GUI | |
| 29 |
| `audiofiles-sync` | Cloud sync | Uses tokio + `spawn_blocking` for rusqlite | |
| 30 |
| `audiofiles-rhai` | Device plugins | TOML manifests + sandboxed Rhai hooks | |
| 31 |
| `audiofiles-train` | ML training | Dev-only, not shipped | |
| 32 |
| `audiofiles-bench` | Benchmarks | criterion-based | |
| 33 |
|
| 34 |
**Critical rule:** `audiofiles-core` is entirely synchronous. `rusqlite::Connection` is `!Send`, so all database operations are synchronous. Long-running operations (import, analysis, export) use dedicated worker threads with channel-based message passing. |
| 35 |
|
| 36 |
## Content-Addressed Storage |
| 37 |
|
| 38 |
Samples are identified by SHA-256 hash — the hash IS the primary key. There are no UUIDs or auto-increment IDs for samples. |
| 39 |
|
| 40 |
```rust |
| 41 |
pub struct SampleStore { |
| 42 |
root: PathBuf, |
| 43 |
} |
| 44 |
``` |
| 45 |
|
| 46 |
**Import flow:** |
| 47 |
1. Stream file through SHA-256 hasher |
| 48 |
2. Check if blob already exists at `samples/{hash}.{ext}` — if so, skip copy (dedup) |
| 49 |
3. `INSERT OR IGNORE INTO samples` — dedup at DB level too |
| 50 |
4. Return the hash |
| 51 |
|
| 52 |
**Rules:** |
| 53 |
- Never store samples by filename or path. The hash is the only identifier. |
| 54 |
- The `cloud_only` flag marks samples whose local blobs have been evicted but still exist in cloud sync. |
| 55 |
- Hash validation rejects anything that isn't exactly 64 lowercase hex characters. |
| 56 |
- Extension validation rejects directory traversal attempts. |
| 57 |
|
| 58 |
## Worker Thread Pattern |
| 59 |
|
| 60 |
Long-running operations use dedicated threads with channel-based message passing: |
| 61 |
|
| 62 |
```rust |
| 63 |
pub struct WorkerHandle { |
| 64 |
cmd_tx: mpsc::Sender<WorkerCommand>, // Send commands to worker |
| 65 |
event_rx: Mutex<mpsc::Receiver<WorkerEvent>>, // Receive results |
| 66 |
cancel_flag: Arc<AtomicBool>, // Lock-free cancellation |
| 67 |
thread: Option<JoinHandle<()>>, // Join on drop |
| 68 |
} |
| 69 |
``` |
| 70 |
|
| 71 |
**Command/Event protocol:** |
| 72 |
- `WorkerCommand::AnalyzeBatch { samples, config }` → worker processes samples |
| 73 |
- `WorkerEvent::Progress { completed, total, current_name }` → UI updates progress bar |
| 74 |
- `WorkerEvent::SampleDone { result, suggestions }` → UI stores result |
| 75 |
- `WorkerEvent::BatchComplete` → UI finishes operation |
| 76 |
|
| 77 |
**Rules:** |
| 78 |
- `Mutex<Receiver>` satisfies `Send + Sync` requirements for egui state. |
| 79 |
- `Arc<AtomicBool>` for lock-free cancellation — worker checks before each sample. |
| 80 |
- `Drop` implementation sends `Shutdown` command and joins the thread. |
| 81 |
- The browser crate calls `try_recv()` each frame to poll for events. |
| 82 |
|
| 83 |
## Backend Trait |
| 84 |
|
| 85 |
`Backend` is an async trait that abstracts all data operations: |
| 86 |
|
| 87 |
```rust |
| 88 |
pub trait Backend: Send + Sync { |
| 89 |
fn list_vfs(&self) -> BackendResult<Vec<Vfs>>; |
| 90 |
fn import_file(&self, path: &Path) -> BackendResult<String>; |
| 91 |
fn start_analysis(&self, samples: Vec<(String, String)>, config: AnalysisConfig) -> BackendResult<()>; |
| 92 |
fn poll_events(&self) -> Vec<BackendEvent>; |
| 93 |
// ... ~30 methods covering VFS, tags, search, analysis, export |
| 94 |
} |
| 95 |
``` |
| 96 |
|
| 97 |
`DirectBackend` is the sole implementation, wrapping `Mutex<Database>` + `SampleStore`. This indirection exists to support potential future backends without changing UI code. UI code always calls `backend.method()`, never accesses the database directly. |
| 98 |
|
| 99 |
## VFS Abstraction |
| 100 |
|
| 101 |
Users organize samples through virtual file systems (VFS), not by moving files on disk. |
| 102 |
|
| 103 |
- `vfs_nodes` is a self-referential tree (`parent_id` FK to own table) |
| 104 |
- Nodes are either `Directory` or `Sample` (with a `sample_hash` FK) |
| 105 |
- A sample can appear in multiple VFS locations without duplication |
| 106 |
- Move/rename operations only touch VFS metadata, not the blob store |
| 107 |
- `UNIQUE(vfs_id, parent_id, name)` prevents duplicate names in the same directory |
| 108 |
|
| 109 |
**Enriched queries** join VFS nodes with analysis data and sample metadata in a single query, avoiding N+1 patterns. |
| 110 |
|
| 111 |
## Analysis Pipeline |
| 112 |
|
| 113 |
The pipeline runs in a worker thread and processes each sample through these stages: |
| 114 |
|
| 115 |
1. **Decode** — Symphonia decodes any audio format to mono f32 |
| 116 |
2. **Loudness** — Peak dB, RMS dB, LUFS (fast, uses full signal) |
| 117 |
3. **Spectral** — STFT → centroid, flatness, rolloff, bandwidth, ZCR, onset strength |
| 118 |
4. **MFCC + ML** — Extract MFCCs from magnitude frames, run through embedded neural classifier |
| 119 |
5. **BPM** — Tempo detection (skipped for non-rhythmic samples if `smart_skip` enabled) |
| 120 |
6. **Key** — Musical key detection (skipped for non-pitched samples if `smart_skip` enabled) |
| 121 |
7. **Loop** — Loop point detection |
| 122 |
8. **Fingerprint** — Peak envelope for near-duplicate detection (VP-tree similarity search) |
| 123 |
|
| 124 |
All results stored in `audio_analysis` table (one row per hash). The `smart_skip` feature uses ML classification to skip irrelevant stages (e.g., no BPM detection for ambient textures). |
| 125 |
|
| 126 |
### Adding a New Analysis Stage |
| 127 |
|
| 128 |
1. Add the computation in `crates/audiofiles-core/src/analysis/` |
| 129 |
2. Add column(s) to `audio_analysis` table via inline migration in `db.rs` |
| 130 |
3. Wire into the pipeline in `analysis/mod.rs` |
| 131 |
4. Add to `AnalysisResult` struct |
| 132 |
5. Expose in the enriched VFS query if needed for the UI |
| 133 |
|
| 134 |
## Unsafe FFI |
| 135 |
|
| 136 |
Platform-specific drag-and-drop requires FFI: |
| 137 |
- **macOS:** `drag_out/macos.rs` — objc2 message sends, libdispatch async to main thread |
| 138 |
- **Windows:** `drag_out/windows.rs` — COM/OLE `DoDragDrop` |
| 139 |
|
| 140 |
**Rules:** |
| 141 |
- Every `unsafe` block MUST have a `// SAFETY:` comment explaining the invariant. |
| 142 |
- macOS FFI uses `MainThreadMarker` to guarantee AppKit calls happen on the main thread. |
| 143 |
- `RcBlock` prevents use-after-free in async dispatch. |
| 144 |
- The `DRAG_ACTIVE` atomic flag prevents concurrent drag sessions. |
| 145 |
|
| 146 |
## Error Handling |
| 147 |
|
| 148 |
Three error types, one per major crate: |
| 149 |
|
| 150 |
```rust |
| 151 |
// audiofiles-core |
| 152 |
pub enum CoreError { |
| 153 |
Db(rusqlite::Error), |
| 154 |
Io { path: PathBuf, source: std::io::Error }, |
| 155 |
SampleNotFound(String), |
| 156 |
VfsNotFound(VfsId), |
| 157 |
Analysis(AnalysisError), |
| 158 |
// ... |
| 159 |
} |
| 160 |
|
| 161 |
// audiofiles-sync |
| 162 |
pub enum SyncError { |
| 163 |
Db(rusqlite::Error), |
| 164 |
Client(String), |
| 165 |
Auth(String), |
| 166 |
Io(std::io::Error), |
| 167 |
} |
| 168 |
|
| 169 |
// audiofiles-browser |
| 170 |
pub enum BackendError { |
| 171 |
Core(CoreError), |
| 172 |
Other(String), |
| 173 |
} |
| 174 |
``` |
| 175 |
|
| 176 |
Use typed variants with context. Use `?` for propagation. `From` impls enable automatic conversion between error types. |
| 177 |
|
| 178 |
## Database |
| 179 |
|
| 180 |
### Inline Migrations |
| 181 |
|
| 182 |
Migrations are `const` strings in `crates/audiofiles-core/src/db.rs`, applied sequentially on database open: |
| 183 |
|
| 184 |
```rust |
| 185 |
const MIGRATION_001: &str = r#" |
| 186 |
CREATE TABLE samples ( |
| 187 |
hash TEXT PRIMARY KEY, |
| 188 |
original_name TEXT NOT NULL, |
| 189 |
... |
| 190 |
); |
| 191 |
"#; |
| 192 |
``` |
| 193 |
|
| 194 |
Production uses a file-backed SQLite database. Tests use `:memory:`. |
| 195 |
|
| 196 |
When adding a migration, make it **replay-safe**: every `CREATE TABLE / INDEX / TRIGGER` should be `IF NOT EXISTS` (or preceded by `DROP IF EXISTS` for triggers whose body changes), and any seed insert should be `INSERT OR IGNORE`. The `migration_replay_from_version_two_against_full_schema` test in `db.rs` rolls `user_version` back to 2 and re-runs every migration from M003 onward against a populated schema — non-idempotent CREATEs fail it. M001 (initial schema) and M002 (`DROP TABLE tags; ALTER tags_v2 RENAME TO tags`) are inherently one-shot and excluded from the replay test. |
| 197 |
|
| 198 |
The connection registers a custom `hash_row_id(salt, key)` SQLite function on open (rusqlite `functions` feature). It's used by the M018 sync triggers; if you write a migration that creates new sync triggers, prefer it for any row_id that would otherwise leak user content. |
| 199 |
|
| 200 |
### Sync Changelog Triggers |
| 201 |
|
| 202 |
Every synced table has triggers that insert into `sync_changelog` on INSERT/UPDATE/DELETE. A `sync_state` row (`applying_remote = '1'`) suppresses triggers during pull operations to prevent recursion. |
| 203 |
|
| 204 |
Per migration M018 (2026-06-02), `sync_changelog.row_id` is hashed via `hash_row_id(row_id_salt, canonical_key)` for sensitive tables (samples, audio_analysis, tags, collection_members) so the server never sees raw sample hashes or tag strings. The salt is generated per device, stored in `sync_state`, never synced. DELETE triggers also emit the canonical PK in the encrypted `data` field, which `resolve::apply_delete` reads to reconstruct WHERE clauses without parsing the (now-opaque) row_id. When adding a new synced table, follow the same pattern: wrap row_id in `hash_row_id(...)` if it carries user content, and emit the canonical PK into `data` for DELETE. |
| 205 |
|
| 206 |
### rusqlite + async |
| 207 |
|
| 208 |
`rusqlite::Connection` is `!Send`. In the sync crate (which uses tokio), all database operations go through `tokio::task::spawn_blocking`. In core (sync-only), no async runtime is needed. |
| 209 |
|
| 210 |
## SyncKit Integration |
| 211 |
|
| 212 |
Cloud sync is optional. The `SyncManager` coordinates push/pull: |
| 213 |
|
| 214 |
- **Tables synced** (in FK-safe order): `vfs`, `samples`, `collections`, `vfs_nodes`, `audio_analysis`, `tags`, `collection_members`, `user_config`, `edit_history` (smart_folders merged into `collections.filter_json` in M015 and the standalone table dropped) |
| 215 |
- **Delete order** is reversed (children first) |
| 216 |
- **Column whitelist:** `table_columns()` restricts which columns sync to prevent schema drift |
| 217 |
- **Blob sync:** Sample files sync to cloud storage for VFS entries with `sync_files = true` |
| 218 |
- **`cloud_only` flag:** Marks samples whose local blobs have been evicted |
| 219 |
|
| 220 |
## Device Plugins |
| 221 |
|
| 222 |
TOML manifests in `plugins/devices/` define hardware constraints. Optional Rhai scripts in `hooks/` run sandboxed. |
| 223 |
|
| 224 |
### Hook Style |
| 225 |
|
| 226 |
Optional Rhai hooks follow the cross-project Rhai style guide. Run `_meta/scripts/lint-rhai.sh` to check formatting. Key points: 4-space indent, `snake_case` functions, `UPPER_CASE` constants, header comment block. |
| 227 |
|
| 228 |
### Manifest Contract |
| 229 |
|
| 230 |
```toml |
| 231 |
[device] |
| 232 |
name = "SP-404 MKII" |
| 233 |
manufacturer = "Roland" |
| 234 |
|
| 235 |
[audio] |
| 236 |
formats = ["wav"] |
| 237 |
sample_rates = [44100, 48000] |
| 238 |
bit_depths = [16, 24] |
| 239 |
channels = "both" |
| 240 |
|
| 241 |
[naming] |
| 242 |
case = "upper" |
| 243 |
max_length = 12 |
| 244 |
|
| 245 |
[hooks] |
| 246 |
validate_sample = "hooks/validate.rhai" |
| 247 |
transform_filename = "hooks/filename.rhai" |
| 248 |
``` |
| 249 |
|
| 250 |
### Hook Functions |
| 251 |
|
| 252 |
|
| 253 |
|
| 254 |
| `validate_sample` | `info` (sample metadata) | `bool` | Accept/reject sample for device | |
| 255 |
| `transform_filename` | `name`, `ctx` | `String` | Rename for device conventions | |
| 256 |
| `pre_export` | `ctx` | — | Run before export batch | |
| 257 |
| `post_export` | `ctx` | — | Run after export batch | |
| 258 |
|
| 259 |
## Concurrency |
| 260 |
|
| 261 |
- `parking_lot::Mutex` everywhere (not `std::sync::Mutex`) — no poisoning, shorter lock API. |
| 262 |
- `#[instrument(skip_all)]` on all significant functions. |
| 263 |
- Worker threads for long-running operations (never block the UI thread). |
| 264 |
- The egui render loop polls `backend.poll_events()` each frame for worker results. |
| 265 |
|
| 266 |
## Testing |
| 267 |
|
| 268 |
- **Core tests:** In-file `#[cfg(test)]` modules with `test_helpers::insert_fake_sample` for fixtures |
| 269 |
- **Sync tests:** Unit tests in sync crate modules |
| 270 |
- **No GUI tests:** Immediate-mode UI is tested manually |
| 271 |
- Test databases use in-memory SQLite (`:memory:`) with migrations applied via `Database::open` |
| 272 |
|
| 273 |
## Building and Distribution |
| 274 |
|
| 275 |
Summary of platform-specific builds: |
| 276 |
|
| 277 |
|
| 278 |
|
| 279 |
| macOS arm64 | Native cargo, signed + notarized DMG | |
| 280 |
| Windows x86_64 | `cargo-xwin` cross-compile, MSI + EXE | |
| 281 |
| Linux aarch64 | Native cargo on Astra, AppImage + .deb | |
| 282 |
| Linux x86_64 | Cross-compile on Astra, AppImage + .deb | |
| 283 |
|
| 284 |
Every release must have all 7 artifacts before uploading. |
| 285 |
|