| 1 |
# Sample deletion — tombstone design (proposal) |
| 2 |
|
| 3 |
**Status:** proposal, 2026-06-02. Not yet implemented. |
| 4 |
|
| 5 |
**Author:** Max / Claude (audit session) |
| 6 |
|
| 7 |
**Scope:** the multi-device sync semantics of deleting a sample. Single-device delete is already correct (placement-only via `vfs_nodes` delete; sample row untouched). This doc covers the open question: what happens when a `samples` row deletion needs to propagate. |
| 8 |
|
| 9 |
--- |
| 10 |
|
| 11 |
## Problem |
| 12 |
|
| 13 |
Today, `apply_remote_changes` in `crates/audiofiles-sync/src/service/resolve.rs` applies a remote `DELETE samples WHERE hash=X` directly. SQLite enforces `ON DELETE CASCADE` on `vfs_nodes.sample_hash`, `tags.sample_hash`, and `collection_members.sample_hash` at the engine level (not via triggers), so `applying_remote='1'` does not suppress it. The receiving device loses every placement, every tag, and every collection membership of that sample — silently, without confirmation. |
| 14 |
|
| 15 |
The user's mental model for an audiofiles library is "my library, accessible from any of my devices, hand-picked launch cohort" (launch plan, 2026-06-01). The current behavior matches the implementation of "global library, but delete on one device wipes everything on every device with no record." Two specific surprises: |
| 16 |
|
| 17 |
- **Device A deletes a sample they no longer want** → Device B (which had organized that sample into 3 collections and tagged it across 12 placements) loses all that work without warning. |
| 18 |
- **Device A runs an automated cleanup** (today: VFS-delete sweep; tomorrow: the new "Cleanup orphans" menu without the `applying_remote` guard we just added) → all of A's placements gone → push → B's placements gone too. |
| 19 |
|
| 20 |
The path forward is **soft-delete with a tombstone column**, replicating across devices, with a recovery window before hard delete. This document is the design contract before implementation. |
| 21 |
|
| 22 |
--- |
| 23 |
|
| 24 |
## Non-goals |
| 25 |
|
| 26 |
- **Cross-device "is this sample still in use anywhere?" awareness.** That requires a server-side reference count, which the current sync model (encrypted blob storage + opaque changelog) explicitly does not have. Each device sees only its own references plus the tombstone state. |
| 27 |
- **Per-VFS or per-collection delete isolation.** Removing a sample from one collection is already handled by `collection_members` delete (no sample deletion involved). Same for VFS placements. |
| 28 |
- **Reversing already-shipped purges.** Any `samples` row already deleted before this design lands is gone for good — the sync layer has no way to ask other devices "did you keep a copy." |
| 29 |
- **Server-side garbage collection of orphaned blobs.** Blob lifecycle on the cloud is a separate concern; this design only changes the DB row lifecycle. |
| 30 |
|
| 31 |
--- |
| 32 |
|
| 33 |
## Design |
| 34 |
|
| 35 |
### Schema (M019) |
| 36 |
|
| 37 |
Add a column to `samples`: |
| 38 |
|
| 39 |
```sql |
| 40 |
ALTER TABLE samples ADD COLUMN deleted_at INTEGER; |
| 41 |
``` |
| 42 |
|
| 43 |
`NULL` means live. A non-NULL Unix timestamp means tombstoned at that wall-clock instant. The column is nullable and indexed (`CREATE INDEX idx_samples_deleted_at ON samples(deleted_at) WHERE deleted_at IS NOT NULL;`) for the eventual sweep query. |
| 44 |
|
| 45 |
### CASCADE policy change |
| 46 |
|
| 47 |
`vfs_nodes.sample_hash REFERENCES samples(hash) ON DELETE CASCADE` stays as-is. The CASCADE only fires when the row is **hard-deleted** (post-tombstone-window sweep). For the soft-delete path, the `samples` row stays, so no CASCADE fires; placements remain; user can recover. |
| 48 |
|
| 49 |
`tags.sample_hash` and `collection_members.sample_hash` likewise unchanged. Hard delete still cleans up cleanly. |
| 50 |
|
| 51 |
### Read-path update surface |
| 52 |
|
| 53 |
Every `SELECT FROM samples` (or join through `samples`) gains a `WHERE samples.deleted_at IS NULL` filter unless the caller is explicitly working with tombstoned rows (Trash view, sweep query, undelete). Estimated surface area (preliminary grep, needs verification during implementation): |
| 54 |
|
| 55 |
- `crates/audiofiles-core/src/store.rs` — every `sample_path`, `sample_extension`, dedup check |
| 56 |
- `crates/audiofiles-core/src/analysis/decode.rs`, `waveform.rs` — analysis pipeline |
| 57 |
- `crates/audiofiles-core/src/search/*` — search queries |
| 58 |
- `crates/audiofiles-core/src/vfs.rs` — enriched VFS queries (join to samples) |
| 59 |
- `crates/audiofiles-core/src/collections.rs` — collection member queries |
| 60 |
- `crates/audiofiles-core/src/tags.rs` — tag queries |
| 61 |
- `crates/audiofiles-core/src/fingerprint.rs`, `similarity.rs` — near-duplicate detection |
| 62 |
- `crates/audiofiles-browser/src/backend/direct.rs` — Backend trait surface |
| 63 |
|
| 64 |
Total ballpark: 30-50 query sites. Most are mechanical (add a clause). A few need a paired explicit-tombstone variant (e.g., the Trash view). |
| 65 |
|
| 66 |
A `is_tombstoned(hash) -> bool` helper in `core::store` is the natural shape for ad-hoc checks. |
| 67 |
|
| 68 |
### Delete operation |
| 69 |
|
| 70 |
"Delete sample" (the explicit user gesture or the cleanup-orphans menu, post-design) becomes: |
| 71 |
|
| 72 |
```sql |
| 73 |
UPDATE samples SET deleted_at = unixepoch() WHERE hash = ?1 AND deleted_at IS NULL; |
| 74 |
``` |
| 75 |
|
| 76 |
That fires the existing `sync_samples_update` trigger (which already exists post-M018), pushing the row to `sync_changelog`. Receiving device pulls the UPDATE, sees `deleted_at` is now set, and: |
| 77 |
|
| 78 |
1. Locally marks the sample as tombstoned (sets its own `deleted_at` to the received value if currently NULL). |
| 79 |
2. Surfaces a "Sample deleted on another device: NAME (12 placements affected)" notification. |
| 80 |
3. Optionally provides an "Undelete on this device" button that re-sets `deleted_at` to NULL locally (creating a temporary divergence until reconciled). |
| 81 |
|
| 82 |
For the local-only case (the "Cleanup orphans" menu we just shipped): same UPDATE, but trigger-suppressed via `applying_remote='1'`. Other devices never learn about it. |
| 83 |
|
| 84 |
### Sweep |
| 85 |
|
| 86 |
A background task on app startup (or scheduled, e.g., once per day) hard-deletes rows where `deleted_at < unixepoch() - 30*86400`. That triggers the CASCADE, fires the sync DELETE trigger, propagates the actual deletion. Hard-delete window: **30 days**, configurable via `user_config` (`sample_tombstone_retain_days`). |
| 87 |
|
| 88 |
Tradeoff: a longer window means more recoverability but more disk consumed by tombstoned-but-still-stored blobs. The 30 days matches macOS Trash defaults. |
| 89 |
|
| 90 |
### UI states |
| 91 |
|
| 92 |
- **Live sample** in any view: no visual change from today. |
| 93 |
- **Tombstoned sample** in normal browse: hidden by the read-path filter. |
| 94 |
- **Trash view**: new entry in the Settings panel showing all tombstoned samples grouped by deletion date. Each row offers "Undelete" (clear `deleted_at`) and "Delete permanently" (skip the retention window, hard-delete now). |
| 95 |
- **Pull notification**: when a remote tombstone is applied, surface a status message: "12 samples tombstoned by another device — see Trash to recover." |
| 96 |
|
| 97 |
### Sync semantics summary |
| 98 |
|
| 99 |
|
| 100 |
|
| 101 |
| Local placement delete (`vfs_nodes` row) | `vfs_nodes` DELETE | That one placement removed locally | |
| 102 |
| User-initiated sample delete | `samples` UPDATE (deleted_at set) | Sample tombstoned locally, placements preserved, notification | |
| 103 |
| Local "Cleanup orphans" | nothing (sync triggers suppressed) | unchanged | |
| 104 |
| Sweep after 30d | `samples` DELETE | Same — CASCADE removes their placements too | |
| 105 |
| Undelete on either device | `samples` UPDATE (deleted_at cleared) | Sample restored everywhere | |
| 106 |
|
| 107 |
The 30-day window means a sweep on Device A propagates as a hard delete to Device B 30 days after A's tombstone — at which point B has also had 30 days to undelete if they cared. Both sides converge. |
| 108 |
|
| 109 |
### Conflict resolution |
| 110 |
|
| 111 |
- Both devices independently delete the same sample at different times: the **earlier** `deleted_at` wins (sync conflict resolution uses min(deleted_at) on UPDATE conflicts). |
| 112 |
- Device A undeletes (sets NULL) while Device B's tombstone is in flight: NULL wins (undelete is the destructive-on-tombstone action). |
| 113 |
- Hard delete on A while B has undeleted: A's CASCADE wipes A's placements; B keeps theirs and pushes a re-INSERT of `samples`. A re-pulls, gets the sample back. Edge case: A's blob may be gone; falls back to `cloud_only` semantics. |
| 114 |
|
| 115 |
### `cloud_only` interaction |
| 116 |
|
| 117 |
Today, `cloud_only` marks samples whose local blob has been evicted but exist in cloud storage. With tombstones: |
| 118 |
|
| 119 |
- Tombstoned samples are NOT automatically marked `cloud_only`. The blob stays on disk during the retention window so undelete is instant. |
| 120 |
- After the sweep hard-deletes the `samples` row, the blob is removed from disk too (existing `SampleStore::remove` path). |
| 121 |
- If a remote re-INSERT arrives after a local hard delete (the "Device B undeleted after A swept" edge), the new `samples` row gets `cloud_only=1` set automatically by the pull-side logic (TBD: add to `apply_upsert` for samples table when the local blob is missing). |
| 122 |
|
| 123 |
--- |
| 124 |
|
| 125 |
## Migration (M019) |
| 126 |
|
| 127 |
```sql |
| 128 |
ALTER TABLE samples ADD COLUMN deleted_at INTEGER; |
| 129 |
CREATE INDEX IF NOT EXISTS idx_samples_deleted_at |
| 130 |
ON samples(deleted_at) WHERE deleted_at IS NOT NULL; |
| 131 |
|
| 132 |
-- Sync trigger update: existing samples triggers already emit hash + the |
| 133 |
-- full row; they pick up the new column automatically via json_object. |
| 134 |
-- No trigger recreation needed. |
| 135 |
|
| 136 |
INSERT OR IGNORE INTO user_config (key, value) VALUES ('sample_tombstone_retain_days', '30'); |
| 137 |
``` |
| 138 |
|
| 139 |
The trigger bodies already serialize `NEW.*` columns by name, so the new `deleted_at` field flows through `sync_changelog` automatically once the migration is applied. |
| 140 |
|
| 141 |
--- |
| 142 |
|
| 143 |
## Test plan |
| 144 |
|
| 145 |
1. **Tombstone basic flow:** mark a sample tombstoned, assert it's hidden from `list_vfs_contents`, undeletable, sweepable. |
| 146 |
2. **Sync replication:** mock a pull that applies a `samples` UPDATE with `deleted_at` set; assert local row is marked tombstoned; assert placements remain. |
| 147 |
3. **Sweep:** insert a tombstoned sample with `deleted_at` 31 days in the past; run sweep; assert hard delete + CASCADE happened. |
| 148 |
4. **Undelete-after-pull-tombstone:** apply remote tombstone; clear `deleted_at` locally; assert the sample is live again locally and a sync push is queued with `deleted_at=NULL`. |
| 149 |
5. **Conflict: both devices tombstone at different times:** apply two UPDATEs in succession; assert min(deleted_at) wins. |
| 150 |
6. **Read-path coverage:** for every Backend method that returns sample data, assert tombstoned rows are excluded by default and included when an explicit `include_tombstoned` flag is set. |
| 151 |
|
| 152 |
--- |
| 153 |
|
| 154 |
## Rollout phasing |
| 155 |
|
| 156 |
**Phase 1 — M019 + read-path filter** (1 session). Land the schema, add the `WHERE deleted_at IS NULL` filter across all read sites. Tombstones don't surface in UI yet; behavior change is invisible to users (everything is still NULL). |
| 157 |
|
| 158 |
**Phase 2 — delete & undelete operations** (1 session). Wire the UPDATE path. Replace the current `SampleStore::remove` callers with `tombstone_sample` (a new method). The cleanup-orphans menu shipped 2026-06-02 keeps its local-only semantics but switches from hard delete to tombstone + immediate hard delete (since orphan = not referenced, no recovery value). |
| 159 |
|
| 160 |
**Phase 3 — Trash UI** (1 session). New Settings section listing tombstoned samples with undelete + delete-permanently affordances. |
| 161 |
|
| 162 |
**Phase 4 — sweep** (1 session). Background task on app startup hard-deletes past-retention tombstones. Sync push of the resulting `samples` DELETE rows; receiving devices CASCADE their own data (which, by symmetry, has also been tombstoned for >= 30 days). |
| 163 |
|
| 164 |
**Phase 5 — sync notification** (1 session). On pull, count incoming tombstones and surface a one-shot status: "X samples deleted on another device — see Trash to recover." |
| 165 |
|
| 166 |
Phases 1-2 are required for correctness. Phases 3-5 are UX polish that can land in any order after Phase 2. |
| 167 |
|
| 168 |
--- |
| 169 |
|
| 170 |
## Open questions |
| 171 |
|
| 172 |
- **Should tags also be tombstoned?** A tag rename today goes through DELETE+INSERT (composite-PK constraint). If we tombstone samples but not tags, a sample's tag entries vanish at sweep time even if the sample is recovered before sweep. Likely acceptable — tags are derived metadata; the sample's `original_name`, analysis, and placements are the load-bearing recovery. |
| 173 |
- **Window default — 30 days, or shorter?** 30 matches OS Trash conventions. Shorter (7) saves disk; longer (60+) maximizes recovery. Probably a `user_config` knob with default 30. |
| 174 |
- **Hard-delete sync semantics post-sweep:** should the sweep push a single "tombstone-expired" event rather than a `samples` DELETE, so receiving devices that haven't swept yet know to skip their own sweep? Or just let each device sweep independently? Independent sweep is simpler and converges correctly. |
| 175 |
- **What if a device is offline for > retention window?** It receives a hard-delete for samples it still has live placements on. The "Sample deleted on another device" notification probably needs to gain "and the retention window expired, your placements are also being removed" semantics. Surface count + offer Trash recovery before applying. Open detail for Phase 5. |
| 176 |
- **Server-side tombstone awareness:** the server doesn't decrypt `data`, so it can't see `deleted_at`. Tombstones are application-level; server stores opaque encrypted blobs. The sync push of a tombstone is just another encrypted update; the server has no special handling needed. Good — keeps the privacy model intact. |
| 177 |
|
| 178 |
--- |
| 179 |
|
| 180 |
## Why not E (per-device orphan management) |
| 181 |
|
| 182 |
The "each device manages its own purges; samples DELETE never propagates as global destroy" model is lighter (no schema change), but creates an awkward asymmetry: "delete on Device A doesn't propagate to Device B." This breaks the stated product model ("my library, accessible from any of my devices") — users expect "I deleted this" to mean it's gone from their library, not just from this device. E only works for a federation model (collaborator A and B each curate their own subset of a shared blob pool) which isn't what audiofiles is. |
| 183 |
|
| 184 |
## Why not G (confirmation-only with current CASCADE) |
| 185 |
|
| 186 |
G doesn't fix the sync-pull surprise. A confirm dialog on local delete is helpful but doesn't help Device B when Device A purges. The sync-pull cascade still wipes B's placements without B ever seeing a dialog. G is at best an additional safety net under either E or F, not a substitute. |
| 187 |
|
| 188 |
--- |
| 189 |
|
| 190 |
## Acceptance criteria |
| 191 |
|
| 192 |
This design is implemented when: |
| 193 |
|
| 194 |
- A sample marked `deleted_at` is invisible from every Backend method that returns sample data, unless explicitly opted-in. |
| 195 |
- Sync push of the tombstone replicates the `deleted_at` field; receiving device marks the same sample as tombstoned without losing placements. |
| 196 |
- A user can browse the Trash view, see their tombstoned samples, and either undelete or delete permanently. |
| 197 |
- A 30-day-old tombstone is automatically hard-deleted on next app startup. |
| 198 |
- The "Cleanup orphans" menu (already shipped 2026-06-02) is local-only and continues to work; its eventual replacement uses the tombstone path with immediate hard-delete (no retention for things you never placed). |
| 199 |
- No existing tests fail; the new behavior is covered by tests per the Test plan section. |
| 200 |
|