# audiofiles -- ML Classification System

Two-layer system that classifies audio samples into 16 categories. Layer 1 uses rule-based heuristics for broad classification. Layer 2 uses a 200-tree Random Forest for fine-grained drum sub-classification.

## Architecture

```
Audio file
  ↓ decode (Symphonia → mono f32)
  ↓ feature extraction (9 spectral + 26 MFCC = 35 features)
  ↓
Layer 1: classify_broad()          ← rule-based heuristics
  ├─ Drum → Layer 2: predict_layer2()  ← Random Forest (200 trees)
  │           └─ Kick / Snare / HiHat / Cymbal / Percussion
  └─ Non-drum → return directly
       └─ Bass / Vocal / Synth / Pad / Noise / Music / Ambience / Impact / Foley / Texture / Misc
```

### Layer 1: Rule-Based Broad Classifier

`classify_broad()` in `crates/audiofiles-core/src/analysis/classify.rs`

Routes samples into broad categories using spectral and waveform features:

| Rule | Condition | Category |
|------|-----------|----------|
| Noise | flatness > 0.7 | Noise |
| Drum | duration < 2.0 AND (attack < 0.05 OR crest > 2.5) | Drum → Layer 2 |
| Bass | centroid < 400 AND flatness < 0.15 | Bass |
| Ambience | duration > 5.0 AND low centroid_variance AND 0.15 < flatness < 0.5 | Ambience |
| Impact | crest > 10.0 AND attack < 0.005 | Impact |
| Texture | duration > 2.0 AND centroid_variance > 500,000 | Texture |

Rules are evaluated in priority order. Confidence values range 0.75--0.95 depending on how strongly the sample matches.

### Layer 2: Random Forest Drum Classifier

`predict_layer2()` in `crates/audiofiles-core/src/analysis/classify.rs`

- **Model**: 200 decision trees, majority vote
- **Classes**: Kick (0), Snare (1), HiHat (2), Cymbal (3), Percussion (4)
- **Confidence**: fraction of trees voting for the majority class (e.g., 0.85 = 170/200 agreed)
- **Fallback**: if the model file has empty trees, reverts to `classify_full()` (16-class rule-based)

### Graceful Degradation

If `layer2_drum.json` contains an empty trees vector, the system falls back to `classify_full()` -- a comprehensive 16-class rule-based classifier covering all categories. The app never crashes on classification.

---

## Feature Vector

35 features total: 9 scalar + 13 MFCC means + 13 MFCC variances.

### Scalar Features (indices 0--8)

| Index | Feature | Source | Description |
|-------|---------|--------|-------------|
| 0 | duration | basic.rs | Total length in seconds |
| 1 | centroid | spectral.rs | Spectral center of mass in Hz |
| 2 | flatness | spectral.rs | 0.0 (pure tone) to 1.0 (white noise), geometric/arithmetic mean of magnitudes |
| 3 | zcr | spectral.rs | Zero-crossing rate (fraction of sign changes per sample) |
| 4 | onset_strength | spectral.rs | Sum of positive spectral flux across STFT frames |
| 5 | bandwidth | spectral.rs | Spectral standard deviation around centroid in Hz |
| 6 | centroid_variance | spectral.rs | Variance of per-frame centroids (high = evolving spectrum) |
| 7 | crest_factor | basic.rs | Peak / RMS in linear domain (high > 8 = impacts) |
| 8 | attack_time | basic.rs | Time to reach 90% of peak amplitude in seconds |

### MFCC Features (indices 9--34)

| Indices | Feature | Description |
|---------|---------|-------------|
| 9--21 | MFCC means | Mean of first 13 MFCCs across all STFT frames |
| 22--34 | MFCC variances | Variance of first 13 MFCCs across all STFT frames |

MFCC computation: 26-bin mel filterbank applied to STFT magnitude frames, log energy transform, DCT-II, keep first 13 coefficients.

### STFT Parameters

- FFT size: 2048 points with Hann window
- Hop size: 512 samples

---

## Training Pipeline

Binary: `crates/audiofiles-train/src/main.rs` (not built by default).

### Data

- Source: `~/Git/Drums/test_data/` with subdirectories per class
- Classes: `kick/`, `snare/`, `hihat/`, `cymbal/`, `clap/`, `tom/`, `percussion/`
- Class mapping: kick→0, snare→1, hihat→2, cymbal→3, clap/tom/percussion→4
- Dataset: 4,343 labeled drum samples

### Algorithm

- **200 decision trees**, each trained on a bootstrap sample (random with replacement)
- **Max depth**: 25 levels per tree
- **Min leaf**: 3 samples minimum per leaf node
- **Features per split**: sqrt(35) = ~6 random features sampled per split decision
- **Split criterion**: Gini impurity
- **Parallelism**: Trees trained in parallel via rayon

### Evaluation

- **5-fold stratified cross-validation** (preserves class distribution)
- **94.4% strict accuracy** on 4,343 samples
- Per-class precision, recall, and F1 computed across all folds

### Output

- Model file: `crates/audiofiles-core/models/layer2_drum.json` (4.0 MB)
- Format: JSON array of 200 trees + class metadata
- Each tree node is either a `Split { feature, threshold, left, right }` or `Leaf { class }`

---

## Model Loading

The model is embedded at compile time and deserialized lazily on first use:

```rust
static LAYER2_MODEL: OnceLock<RandomForestModel> = OnceLock::new();

fn layer2_model() -> &'static RandomForestModel {
    LAYER2_MODEL.get_or_init(|| {
        serde_json::from_slice(LAYER2_MODEL_BYTES)
            .expect("embedded Layer 2 model is invalid JSON")
    })
}
```

- `include_bytes!` embeds `layer2_drum.json` into the binary
- `OnceLock` ensures deserialization happens exactly once
- After init, all subsequent calls return a static reference (zero cost)

---

## Database Integration

Classification results are stored in the `audio_analysis` table:

| Column | Type | Description |
|--------|------|-------------|
| classification | TEXT | SampleClass as lowercase string (e.g., "kick") |
| classification_confidence | REAL | 0.0--1.0; RF vote fraction for drums, heuristic confidence for non-drums |

---

## Retraining

To retrain the model with new or updated training data:

1. Organize labeled samples in `~/Git/Drums/test_data/{class}/`
2. Run `cargo run -p audiofiles-train`
3. The binary outputs cross-validation metrics and writes `layer2_drum.json`
4. Rebuild audiofiles to embed the updated model

---

## Key Files

| What | Where |
|------|-------|
| Two-layer classifier | `crates/audiofiles-core/src/analysis/classify.rs` |
| Spectral features | `crates/audiofiles-core/src/analysis/spectral.rs` |
| MFCC computation | `crates/audiofiles-core/src/analysis/mfcc.rs` |
| Crest factor, attack time | `crates/audiofiles-core/src/analysis/basic.rs` |
| Training pipeline | `crates/audiofiles-train/src/main.rs` |
| Embedded model | `crates/audiofiles-core/models/layer2_drum.json` |
| Analysis orchestrator | `crates/audiofiles-core/src/analysis/mod.rs` |