|
1 |
+ |
# TagTree -- Architecture
|
|
2 |
+ |
|
|
3 |
+ |
## Purpose
|
|
4 |
+ |
|
|
5 |
+ |
TagTree is a shared Rust library crate that provides a hierarchical dot-notation tag standard. Tags are lowercase strings with dot-separated segments representing a hierarchy (e.g., `genre.electronic.house`, `work.meeting.standup`). The crate handles validation, parsing, tree operations, SQL helpers, prefix renaming, and keystroke-speed autocomplete -- with zero runtime dependencies.
|
|
6 |
+ |
|
|
7 |
+ |
## Tag Format
|
|
8 |
+ |
|
|
9 |
+ |
- Segments: lowercase alphanumeric and hyphens (`[a-z0-9-]`)
|
|
10 |
+ |
- Separator: `.` (dot)
|
|
11 |
+ |
- No empty segments, no leading/trailing dots, no consecutive dots
|
|
12 |
+ |
|
|
13 |
+ |
## Per-App TagConfig
|
|
14 |
+ |
|
|
15 |
+ |
Each consumer defines a `TagConfig` constant controlling depth, length, and semantic prefix rules.
|
|
16 |
+ |
|
|
17 |
+ |
| App | max_depth | max_length | semantic_depth | Notes |
|
|
18 |
+ |
|-----|-----------|------------|----------------|-------|
|
|
19 |
+ |
| AF | 5 | 100 | 1 | Deep hierarchy, namespace-driven (`genre.electronic.house`) |
|
|
20 |
+ |
| GO | 3 | 60 | 0 | Shallow free-form tags |
|
|
21 |
+ |
| BB | 3 | 80 | 0 | Shallow free-form tags |
|
|
22 |
+ |
| MT | 3 | 64 | 0 | Community-scoped thread tags |
|
|
23 |
+ |
| MNW | 5 | 100 | 0 | Content/project tags |
|
|
24 |
+ |
|
|
25 |
+ |
- `max_depth` -- maximum number of segments allowed
|
|
26 |
+ |
- `max_length` -- maximum character length of the entire tag string
|
|
27 |
+ |
- `semantic_depth` -- number of leading segments that carry dispatch meaning (0 = all free-form; 1 = first segment is a namespace like `genre`, `mood`)
|
|
28 |
+ |
|
|
29 |
+ |
## Core Types
|
|
30 |
+ |
|
|
31 |
+ |
| Type | Role |
|
|
32 |
+ |
|------|------|
|
|
33 |
+ |
| `TagConfig` | Per-app validation rules (depth, length, semantic prefix) |
|
|
34 |
+ |
| `TagError` | Error type returned on validation failure |
|
|
35 |
+ |
| `TagIndex` | In-memory sorted index for autocomplete suggestions |
|
|
36 |
+ |
| `SEPARATOR` | The `.` constant |
|
|
37 |
+ |
|
|
38 |
+ |
## API Surface
|
|
39 |
+ |
|
|
40 |
+ |
### Validation
|
|
41 |
+ |
|
|
42 |
+ |
- `validate_with(tag, config)` -- validate against a `TagConfig`
|
|
43 |
+ |
- `validate(tag)` -- validate with defaults (depth 5, length 100, no semantic prefix)
|
|
44 |
+ |
|
|
45 |
+ |
### Hierarchy Parsing
|
|
46 |
+ |
|
|
47 |
+ |
- `parent(tag)` -- parent path (`"a.b.c"` -> `Some("a.b")`)
|
|
48 |
+ |
- `leaf(tag)` -- last segment (`"a.b.c"` -> `"c"`)
|
|
49 |
+ |
- `depth(tag)` -- segment count (`"a.b.c"` -> `3`)
|
|
50 |
+ |
- `segment(tag, i)` -- extract segment by 0-based index
|
|
51 |
+ |
- `prefix_at_depth(tag, n)` -- first `n` segments as a path
|
|
52 |
+ |
- `ancestors(tag)` -- all ancestor paths, root to parent
|
|
53 |
+ |
|
|
54 |
+ |
### Tree Relationships
|
|
55 |
+ |
|
|
56 |
+ |
- `is_ancestor_of(a, b)` -- true if `a` is an ancestor of `b` at a segment boundary
|
|
57 |
+ |
- `common_ancestor(a, b)` -- longest shared prefix
|
|
58 |
+ |
|
|
59 |
+ |
### Set Operations (in-memory)
|
|
60 |
+ |
|
|
61 |
+ |
- `children_at_prefix(prefix, tags)` -- immediate children one level below prefix
|
|
62 |
+ |
- `subtree(prefix, tags)` -- all descendants of prefix (including prefix if present)
|
|
63 |
+ |
- `rename_prefix(old, new, tag)` -- swap a tag's prefix, returning new tag
|
|
64 |
+ |
|
|
65 |
+ |
### Semantic Splitting
|
|
66 |
+ |
|
|
67 |
+ |
- `semantic_prefix(tag, depth)` -- namespace portion
|
|
68 |
+ |
- `free_suffix(tag, depth)` -- value portion after semantic prefix
|
|
69 |
+ |
|
|
70 |
+ |
### SQL Helpers
|
|
71 |
+ |
|
|
72 |
+ |
- `escape_like(s)` -- escape `%`, `_`, `\` for safe LIKE embedding
|
|
73 |
+ |
- `like_descendant_pattern(prefix)` -- build `prefix.%` for hierarchy queries (works on both SQLite and PostgreSQL)
|
|
74 |
+ |
|
|
75 |
+ |
### TagIndex (Autocomplete)
|
|
76 |
+ |
|
|
77 |
+ |
Sorted `Vec<String>` with binary-search lookup. Two-tier suggestion strategy:
|
|
78 |
+ |
|
|
79 |
+ |
1. **Path prefix** (tier 1) -- binary search for tags whose full path starts with input. O(log n + k).
|
|
80 |
+ |
2. **Segment prefix** (tier 2) -- linear scan for tags where any non-first segment starts with input. Only runs if tier 1 didn't fill the limit and input has no dots. A segment index gates entry: if no known segment starts with input, scan is skipped in O(log m).
|
|
81 |
+ |
|
|
82 |
+ |
Key methods: `new()`, `empty()`, `insert()`, `remove()`, `rebuild()`, `contains()`, `suggest(input, limit)`, `suggest_with_status(input, limit)`.
|
|
83 |
+ |
|
|
84 |
+ |
## Integration Patterns
|
|
85 |
+ |
|
|
86 |
+ |
Apps integrate TagTree as a path dependency and define a `TagConfig` constant:
|
|
87 |
+ |
|
|
88 |
+ |
```rust
|
|
89 |
+ |
use tagtree::{TagConfig, validate_with};
|
|
90 |
+ |
|
|
91 |
+ |
const TAGS: TagConfig = TagConfig { max_depth: 5, max_length: 100, semantic_depth: 1 };
|
|
92 |
+ |
|
|
93 |
+ |
// Validate on create/update
|
|
94 |
+ |
validate_with(&user_input, &TAGS)?;
|
|
95 |
+ |
|
|
96 |
+ |
// SQL hierarchy queries
|
|
97 |
+ |
let pattern = tagtree::like_descendant_pattern("genre");
|
|
98 |
+ |
// WHERE tag LIKE ?1 ESCAPE '\' -> finds all genre.* tags
|
|
99 |
+ |
```
|
|
100 |
+ |
|
|
101 |
+ |
- **AF, GO, BB** -- SQLite repositories use `like_descendant_pattern` for hierarchy queries; `TagIndex` powers autocomplete in the UI
|
|
102 |
+ |
- **MT** -- community-scoped tags validated per-thread; migration 014 added `path` column using TagTree conventions
|
|
103 |
+ |
- **MNW** -- server-side validation on content/project tags; migration 038 added `path` column
|
|
104 |
+ |
|
|
105 |
+ |
## Performance Characteristics
|
|
106 |
+ |
|
|
107 |
+ |
Criterion benchmarks cover six groups at multiple scales (1K, 10K, 50K tags):
|
|
108 |
+ |
|
|
109 |
+ |
| Benchmark Group | What it measures |
|
|
110 |
+ |
|----------------|------------------|
|
|
111 |
+ |
| `validate` | Tag validation (simple, deep, max-depth, invalid, long) |
|
|
112 |
+ |
| `hierarchy` | parent, leaf, depth, ancestors, is_ancestor_of, common_ancestor |
|
|
113 |
+ |
| `sql` | escape_like, like_descendant_pattern |
|
|
114 |
+ |
| `tree_ops` | children_at_prefix, subtree, rename_prefix |
|
|
115 |
+ |
| `tag_index` | Build time, suggest (prefix/path/exact/no-match) at 1K/10K/50K |
|
|
116 |
+ |
| `deep_tree` | Validation and traversal at depth 5/10/20, deep-wide trees (~19K nodes) |
|
|
117 |
+ |
|
|
118 |
+ |
At typical corpus sizes (hundreds to low thousands), `suggest` completes in single-digit microseconds. Path-prefix lookups are O(log n); segment-prefix scans are gated by a segment index to avoid unnecessary work.
|
|
119 |
+ |
|
|
120 |
+ |
## Key Paths
|
|
121 |
+ |
|
|
122 |
+ |
| What | Path |
|
|
123 |
+ |
|------|------|
|
|
124 |
+ |
| Library source | `Shared/tagtree/src/lib.rs` |
|
|
125 |
+ |
| Cargo manifest | `Shared/tagtree/Cargo.toml` |
|
|
126 |
+ |
| Benchmarks | `Shared/tagtree/benches/tagtree_bench.rs` |
|
|
127 |
+ |
| Audit review | `docs/shared/tagtree/audit_review.md` |
|
|
128 |
+ |
|
|
129 |
+ |
## Metrics
|
|
130 |
+ |
|
|
131 |
+ |
- 1,441 LOC (single file, ~500 lines tests + ~940 library)
|
|
132 |
+ |
- 99 tests (68.7 tests/KLOC)
|
|
133 |
+ |
- Zero runtime dependencies (std-only)
|
|
134 |
+ |
- Zero unsafe, zero unwrap in production code
|
|
135 |
+ |
- Rust 2024 edition
|