# audiofiles -- ML Classification System Two-layer system that classifies audio samples into 16 categories. Layer 1 uses rule-based heuristics for broad classification. Layer 2 uses a 200-tree Random Forest for fine-grained drum sub-classification. ## Architecture ``` Audio file ↓ decode (Symphonia → mono f32) ↓ feature extraction (9 spectral + 26 MFCC = 35 features) ↓ Layer 1: classify_broad() ← rule-based heuristics ├─ Drum → Layer 2: predict_layer2() ← Random Forest (200 trees) │ └─ Kick / Snare / HiHat / Cymbal / Percussion └─ Non-drum → return directly └─ Bass / Vocal / Synth / Pad / Noise / Music / Ambience / Impact / Foley / Texture / Misc ``` ### Layer 1: Rule-Based Broad Classifier `classify_broad()` in `crates/audiofiles-core/src/analysis/classify.rs` Routes samples into broad categories using spectral and waveform features: | Rule | Condition | Category | |------|-----------|----------| | Noise | flatness > 0.7 | Noise | | Drum | duration < 2.0 AND (attack < 0.05 OR crest > 2.5) | Drum → Layer 2 | | Bass | centroid < 400 AND flatness < 0.15 | Bass | | Ambience | duration > 5.0 AND low centroid_variance AND 0.15 < flatness < 0.5 | Ambience | | Impact | crest > 10.0 AND attack < 0.005 | Impact | | Texture | duration > 2.0 AND centroid_variance > 500,000 | Texture | Rules are evaluated in priority order. Confidence values range 0.75--0.95 depending on how strongly the sample matches. ### Layer 2: Random Forest Drum Classifier `predict_layer2()` in `crates/audiofiles-core/src/analysis/classify.rs` - **Model**: 200 decision trees, majority vote - **Classes**: Kick (0), Snare (1), HiHat (2), Cymbal (3), Percussion (4) - **Confidence**: fraction of trees voting for the majority class (e.g., 0.85 = 170/200 agreed) - **Fallback**: if the model file has empty trees, reverts to `classify_full()` (16-class rule-based) ### Graceful Degradation If `layer2_drum.json` contains an empty trees vector, the system falls back to `classify_full()` -- a comprehensive 16-class rule-based classifier covering all categories. The app never crashes on classification. --- ## Feature Vector 35 features total: 9 scalar + 13 MFCC means + 13 MFCC variances. ### Scalar Features (indices 0--8) | Index | Feature | Source | Description | |-------|---------|--------|-------------| | 0 | duration | basic.rs | Total length in seconds | | 1 | centroid | spectral.rs | Spectral center of mass in Hz | | 2 | flatness | spectral.rs | 0.0 (pure tone) to 1.0 (white noise), geometric/arithmetic mean of magnitudes | | 3 | zcr | spectral.rs | Zero-crossing rate (fraction of sign changes per sample) | | 4 | onset_strength | spectral.rs | Sum of positive spectral flux across STFT frames | | 5 | bandwidth | spectral.rs | Spectral standard deviation around centroid in Hz | | 6 | centroid_variance | spectral.rs | Variance of per-frame centroids (high = evolving spectrum) | | 7 | crest_factor | basic.rs | Peak / RMS in linear domain (high > 8 = impacts) | | 8 | attack_time | basic.rs | Time to reach 90% of peak amplitude in seconds | ### MFCC Features (indices 9--34) | Indices | Feature | Description | |---------|---------|-------------| | 9--21 | MFCC means | Mean of first 13 MFCCs across all STFT frames | | 22--34 | MFCC variances | Variance of first 13 MFCCs across all STFT frames | MFCC computation: 26-bin mel filterbank applied to STFT magnitude frames, log energy transform, DCT-II, keep first 13 coefficients. ### STFT Parameters - FFT size: 2048 points with Hann window - Hop size: 512 samples --- ## Training Pipeline Binary: `crates/audiofiles-train/src/main.rs` (not built by default). ### Data - Source: `~/Git/Drums/test_data/` with subdirectories per class - Classes: `kick/`, `snare/`, `hihat/`, `cymbal/`, `clap/`, `tom/`, `percussion/` - Class mapping: kick→0, snare→1, hihat→2, cymbal→3, clap/tom/percussion→4 - Dataset: 4,343 labeled drum samples ### Algorithm - **200 decision trees**, each trained on a bootstrap sample (random with replacement) - **Max depth**: 25 levels per tree - **Min leaf**: 3 samples minimum per leaf node - **Features per split**: sqrt(35) = ~6 random features sampled per split decision - **Split criterion**: Gini impurity - **Parallelism**: Trees trained in parallel via rayon ### Evaluation - **5-fold stratified cross-validation** (preserves class distribution) - **94.4% strict accuracy** on 4,343 samples - Per-class precision, recall, and F1 computed across all folds ### Output - Model file: `crates/audiofiles-core/models/layer2_drum.json` (4.0 MB) - Format: JSON array of 200 trees + class metadata - Each tree node is either a `Split { feature, threshold, left, right }` or `Leaf { class }` --- ## Model Loading The model is embedded at compile time and deserialized lazily on first use: ```rust static LAYER2_MODEL: OnceLock = OnceLock::new(); fn layer2_model() -> &'static RandomForestModel { LAYER2_MODEL.get_or_init(|| { serde_json::from_slice(LAYER2_MODEL_BYTES) .expect("embedded Layer 2 model is invalid JSON") }) } ``` - `include_bytes!` embeds `layer2_drum.json` into the binary - `OnceLock` ensures deserialization happens exactly once - After init, all subsequent calls return a static reference (zero cost) --- ## Database Integration Classification results are stored in the `audio_analysis` table: | Column | Type | Description | |--------|------|-------------| | classification | TEXT | SampleClass as lowercase string (e.g., "kick") | | classification_confidence | REAL | 0.0--1.0; RF vote fraction for drums, heuristic confidence for non-drums | --- ## Retraining To retrain the model with new or updated training data: 1. Organize labeled samples in `~/Git/Drums/test_data/{class}/` 2. Run `cargo run -p audiofiles-train` 3. The binary outputs cross-validation metrics and writes `layer2_drum.json` 4. Rebuild audiofiles to embed the updated model --- ## Key Files | What | Where | |------|-------| | Two-layer classifier | `crates/audiofiles-core/src/analysis/classify.rs` | | Spectral features | `crates/audiofiles-core/src/analysis/spectral.rs` | | MFCC computation | `crates/audiofiles-core/src/analysis/mfcc.rs` | | Crest factor, attack time | `crates/audiofiles-core/src/analysis/basic.rs` | | Training pipeline | `crates/audiofiles-train/src/main.rs` | | Embedded model | `crates/audiofiles-core/models/layer2_drum.json` | | Analysis orchestrator | `crates/audiofiles-core/src/analysis/mod.rs` |