| 1 |
# TagTree -- Architecture |
| 2 |
|
| 3 |
## Purpose |
| 4 |
|
| 5 |
TagTree is a shared Rust library crate that provides a hierarchical dot-notation tag standard. Tags are lowercase strings with dot-separated segments representing a hierarchy (e.g., `genre.electronic.house`, `work.meeting.standup`). The crate handles validation, parsing, tree operations, SQL helpers, prefix renaming, and keystroke-speed autocomplete -- with zero runtime dependencies. |
| 6 |
|
| 7 |
## Tag Format |
| 8 |
|
| 9 |
- Segments: lowercase alphanumeric and hyphens (`[a-z0-9-]`) |
| 10 |
- Separator: `.` (dot) |
| 11 |
- No empty segments, no leading/trailing dots, no consecutive dots |
| 12 |
|
| 13 |
## Per-App TagConfig |
| 14 |
|
| 15 |
Each consumer defines a `TagConfig` constant controlling depth, length, and semantic prefix rules. |
| 16 |
|
| 17 |
|
| 18 |
|
| 19 |
| AF | 5 | 100 | 1 | Deep hierarchy, namespace-driven (`genre.electronic.house`) | |
| 20 |
| GO | 3 | 60 | 0 | Shallow free-form tags | |
| 21 |
| BB | 3 | 80 | 0 | Shallow free-form tags | |
| 22 |
| MT | 3 | 64 | 0 | Community-scoped thread tags | |
| 23 |
| MNW | 5 | 100 | 0 | Content/project tags | |
| 24 |
|
| 25 |
- `max_depth` -- maximum number of segments allowed |
| 26 |
- `max_length` -- maximum character length of the entire tag string |
| 27 |
- `semantic_depth` -- number of leading segments that carry dispatch meaning (0 = all free-form; 1 = first segment is a namespace like `genre`, `mood`) |
| 28 |
|
| 29 |
## Core Types |
| 30 |
|
| 31 |
|
| 32 |
|
| 33 |
| `TagConfig` | Per-app validation rules (depth, length, semantic prefix) | |
| 34 |
| `TagError` | Error type returned on validation failure | |
| 35 |
| `TagIndex` | In-memory sorted index for autocomplete suggestions | |
| 36 |
| `SEPARATOR` | The `.` constant | |
| 37 |
|
| 38 |
## API Surface |
| 39 |
|
| 40 |
### Validation |
| 41 |
|
| 42 |
- `validate_with(tag, config)` -- validate against a `TagConfig` |
| 43 |
- `validate(tag)` -- validate with defaults (depth 5, length 100, no semantic prefix) |
| 44 |
|
| 45 |
### Hierarchy Parsing |
| 46 |
|
| 47 |
- `parent(tag)` -- parent path (`"a.b.c"` -> `Some("a.b")`) |
| 48 |
- `leaf(tag)` -- last segment (`"a.b.c"` -> `"c"`) |
| 49 |
- `depth(tag)` -- segment count (`"a.b.c"` -> `3`) |
| 50 |
- `segment(tag, i)` -- extract segment by 0-based index |
| 51 |
- `prefix_at_depth(tag, n)` -- first `n` segments as a path |
| 52 |
- `ancestors(tag)` -- all ancestor paths, root to parent |
| 53 |
|
| 54 |
### Tree Relationships |
| 55 |
|
| 56 |
- `is_ancestor_of(a, b)` -- true if `a` is an ancestor of `b` at a segment boundary |
| 57 |
- `common_ancestor(a, b)` -- longest shared prefix |
| 58 |
|
| 59 |
### Set Operations (in-memory) |
| 60 |
|
| 61 |
- `children_at_prefix(prefix, tags)` -- immediate children one level below prefix |
| 62 |
- `subtree(prefix, tags)` -- all descendants of prefix (including prefix if present) |
| 63 |
- `rename_prefix(old, new, tag)` -- swap a tag's prefix, returning new tag |
| 64 |
|
| 65 |
### Semantic Splitting |
| 66 |
|
| 67 |
- `semantic_prefix(tag, depth)` -- namespace portion |
| 68 |
- `free_suffix(tag, depth)` -- value portion after semantic prefix |
| 69 |
|
| 70 |
### SQL Helpers |
| 71 |
|
| 72 |
- `escape_like(s)` -- escape `%`, `_`, `\` for safe LIKE embedding |
| 73 |
- `like_descendant_pattern(prefix)` -- build `prefix.%` for hierarchy queries (works on both SQLite and PostgreSQL) |
| 74 |
|
| 75 |
### TagIndex (Autocomplete) |
| 76 |
|
| 77 |
Sorted `Vec<String>` with binary-search lookup. Two-tier suggestion strategy: |
| 78 |
|
| 79 |
1. **Path prefix** (tier 1) -- binary search for tags whose full path starts with input. O(log n + k). |
| 80 |
2. **Segment prefix** (tier 2) -- linear scan for tags where any non-first segment starts with input. Only runs if tier 1 didn't fill the limit and input has no dots. A segment index gates entry: if no known segment starts with input, scan is skipped in O(log m). |
| 81 |
|
| 82 |
Key methods: `new()`, `empty()`, `insert()`, `remove()`, `rebuild()`, `contains()`, `suggest(input, limit)`, `suggest_with_status(input, limit)`. |
| 83 |
|
| 84 |
## Integration Patterns |
| 85 |
|
| 86 |
Apps integrate TagTree as a path dependency and define a `TagConfig` constant: |
| 87 |
|
| 88 |
```rust |
| 89 |
use tagtree::{TagConfig, validate_with}; |
| 90 |
|
| 91 |
const TAGS: TagConfig = TagConfig { max_depth: 5, max_length: 100, semantic_depth: 1 }; |
| 92 |
|
| 93 |
// Validate on create/update |
| 94 |
validate_with(&user_input, &TAGS)?; |
| 95 |
|
| 96 |
// SQL hierarchy queries |
| 97 |
let pattern = tagtree::like_descendant_pattern("genre"); |
| 98 |
// WHERE tag LIKE ?1 ESCAPE '\' -> finds all genre.* tags |
| 99 |
``` |
| 100 |
|
| 101 |
- **AF, GO, BB** -- SQLite repositories use `like_descendant_pattern` for hierarchy queries; `TagIndex` powers autocomplete in the UI |
| 102 |
- **MT** -- community-scoped tags validated per-thread; migration 014 added `path` column using TagTree conventions |
| 103 |
- **MNW** -- server-side validation on content/project tags; migration 038 added `path` column |
| 104 |
|
| 105 |
## Performance Characteristics |
| 106 |
|
| 107 |
Criterion benchmarks cover six groups at multiple scales (1K, 10K, 50K tags): |
| 108 |
|
| 109 |
|
| 110 |
|
| 111 |
| `validate` | Tag validation (simple, deep, max-depth, invalid, long) | |
| 112 |
| `hierarchy` | parent, leaf, depth, ancestors, is_ancestor_of, common_ancestor | |
| 113 |
| `sql` | escape_like, like_descendant_pattern | |
| 114 |
| `tree_ops` | children_at_prefix, subtree, rename_prefix | |
| 115 |
| `tag_index` | Build time, suggest (prefix/path/exact/no-match) at 1K/10K/50K | |
| 116 |
| `deep_tree` | Validation and traversal at depth 5/10/20, deep-wide trees (~19K nodes) | |
| 117 |
|
| 118 |
At typical corpus sizes (hundreds to low thousands), `suggest` completes in single-digit microseconds. Path-prefix lookups are O(log n); segment-prefix scans are gated by a segment index to avoid unnecessary work. |
| 119 |
|
| 120 |
## Key Paths |
| 121 |
|
| 122 |
|
| 123 |
|
| 124 |
| Library source | `Shared/tagtree/src/lib.rs` | |
| 125 |
| Cargo manifest | `Shared/tagtree/Cargo.toml` | |
| 126 |
| Benchmarks | `Shared/tagtree/benches/tagtree_bench.rs` | |
| 127 |
| Audit review | `docs/shared/tagtree/audit_review.md` | |
| 128 |
|
| 129 |
## Metrics |
| 130 |
|
| 131 |
- 1,441 LOC (single file, ~500 lines tests + ~940 library) |
| 132 |
- 99 tests (68.7 tests/KLOC) |
| 133 |
- Zero runtime dependencies (std-only) |
| 134 |
- Zero unsafe, zero unwrap in production code |
| 135 |
- Rust 2024 edition |
| 136 |
|