Skip to main content

max / tagtree

5.7 KB · 136 lines History Blame Raw
1 # TagTree -- Architecture
2
3 ## Purpose
4
5 TagTree is a shared Rust library crate that provides a hierarchical dot-notation tag standard. Tags are lowercase strings with dot-separated segments representing a hierarchy (e.g., `genre.electronic.house`, `work.meeting.standup`). The crate handles validation, parsing, tree operations, SQL helpers, prefix renaming, and keystroke-speed autocomplete -- with zero runtime dependencies.
6
7 ## Tag Format
8
9 - Segments: lowercase alphanumeric and hyphens (`[a-z0-9-]`)
10 - Separator: `.` (dot)
11 - No empty segments, no leading/trailing dots, no consecutive dots
12
13 ## Per-App TagConfig
14
15 Each consumer defines a `TagConfig` constant controlling depth, length, and semantic prefix rules.
16
17 | App | max_depth | max_length | semantic_depth | Notes |
18 |-----|-----------|------------|----------------|-------|
19 | AF | 5 | 100 | 1 | Deep hierarchy, namespace-driven (`genre.electronic.house`) |
20 | GO | 3 | 60 | 0 | Shallow free-form tags |
21 | BB | 3 | 80 | 0 | Shallow free-form tags |
22 | MT | 3 | 64 | 0 | Community-scoped thread tags |
23 | MNW | 5 | 100 | 0 | Content/project tags |
24
25 - `max_depth` -- maximum number of segments allowed
26 - `max_length` -- maximum character length of the entire tag string
27 - `semantic_depth` -- number of leading segments that carry dispatch meaning (0 = all free-form; 1 = first segment is a namespace like `genre`, `mood`)
28
29 ## Core Types
30
31 | Type | Role |
32 |------|------|
33 | `TagConfig` | Per-app validation rules (depth, length, semantic prefix) |
34 | `TagError` | Error type returned on validation failure |
35 | `TagIndex` | In-memory sorted index for autocomplete suggestions |
36 | `SEPARATOR` | The `.` constant |
37
38 ## API Surface
39
40 ### Validation
41
42 - `validate_with(tag, config)` -- validate against a `TagConfig`
43 - `validate(tag)` -- validate with defaults (depth 5, length 100, no semantic prefix)
44
45 ### Hierarchy Parsing
46
47 - `parent(tag)` -- parent path (`"a.b.c"` -> `Some("a.b")`)
48 - `leaf(tag)` -- last segment (`"a.b.c"` -> `"c"`)
49 - `depth(tag)` -- segment count (`"a.b.c"` -> `3`)
50 - `segment(tag, i)` -- extract segment by 0-based index
51 - `prefix_at_depth(tag, n)` -- first `n` segments as a path
52 - `ancestors(tag)` -- all ancestor paths, root to parent
53
54 ### Tree Relationships
55
56 - `is_ancestor_of(a, b)` -- true if `a` is an ancestor of `b` at a segment boundary
57 - `common_ancestor(a, b)` -- longest shared prefix
58
59 ### Set Operations (in-memory)
60
61 - `children_at_prefix(prefix, tags)` -- immediate children one level below prefix
62 - `subtree(prefix, tags)` -- all descendants of prefix (including prefix if present)
63 - `rename_prefix(old, new, tag)` -- swap a tag's prefix, returning new tag
64
65 ### Semantic Splitting
66
67 - `semantic_prefix(tag, depth)` -- namespace portion
68 - `free_suffix(tag, depth)` -- value portion after semantic prefix
69
70 ### SQL Helpers
71
72 - `escape_like(s)` -- escape `%`, `_`, `\` for safe LIKE embedding
73 - `like_descendant_pattern(prefix)` -- build `prefix.%` for hierarchy queries (works on both SQLite and PostgreSQL)
74
75 ### TagIndex (Autocomplete)
76
77 Sorted `Vec<String>` with binary-search lookup. Two-tier suggestion strategy:
78
79 1. **Path prefix** (tier 1) -- binary search for tags whose full path starts with input. O(log n + k).
80 2. **Segment prefix** (tier 2) -- linear scan for tags where any non-first segment starts with input. Only runs if tier 1 didn't fill the limit and input has no dots. A segment index gates entry: if no known segment starts with input, scan is skipped in O(log m).
81
82 Key methods: `new()`, `empty()`, `insert()`, `remove()`, `rebuild()`, `contains()`, `suggest(input, limit)`, `suggest_with_status(input, limit)`.
83
84 ## Integration Patterns
85
86 Apps integrate TagTree as a path dependency and define a `TagConfig` constant:
87
88 ```rust
89 use tagtree::{TagConfig, validate_with};
90
91 const TAGS: TagConfig = TagConfig { max_depth: 5, max_length: 100, semantic_depth: 1 };
92
93 // Validate on create/update
94 validate_with(&user_input, &TAGS)?;
95
96 // SQL hierarchy queries
97 let pattern = tagtree::like_descendant_pattern("genre");
98 // WHERE tag LIKE ?1 ESCAPE '\' -> finds all genre.* tags
99 ```
100
101 - **AF, GO, BB** -- SQLite repositories use `like_descendant_pattern` for hierarchy queries; `TagIndex` powers autocomplete in the UI
102 - **MT** -- community-scoped tags validated per-thread; migration 014 added `path` column using TagTree conventions
103 - **MNW** -- server-side validation on content/project tags; migration 038 added `path` column
104
105 ## Performance Characteristics
106
107 Criterion benchmarks cover six groups at multiple scales (1K, 10K, 50K tags):
108
109 | Benchmark Group | What it measures |
110 |----------------|------------------|
111 | `validate` | Tag validation (simple, deep, max-depth, invalid, long) |
112 | `hierarchy` | parent, leaf, depth, ancestors, is_ancestor_of, common_ancestor |
113 | `sql` | escape_like, like_descendant_pattern |
114 | `tree_ops` | children_at_prefix, subtree, rename_prefix |
115 | `tag_index` | Build time, suggest (prefix/path/exact/no-match) at 1K/10K/50K |
116 | `deep_tree` | Validation and traversal at depth 5/10/20, deep-wide trees (~19K nodes) |
117
118 At typical corpus sizes (hundreds to low thousands), `suggest` completes in single-digit microseconds. Path-prefix lookups are O(log n); segment-prefix scans are gated by a segment index to avoid unnecessary work.
119
120 ## Key Paths
121
122 | What | Path |
123 |------|------|
124 | Library source | `Shared/tagtree/src/lib.rs` |
125 | Cargo manifest | `Shared/tagtree/Cargo.toml` |
126 | Benchmarks | `Shared/tagtree/benches/tagtree_bench.rs` |
127 | Audit review | `docs/shared/tagtree/audit_review.md` |
128
129 ## Metrics
130
131 - 1,441 LOC (single file, ~500 lines tests + ~940 library)
132 - 99 tests (68.7 tests/KLOC)
133 - Zero runtime dependencies (std-only)
134 - Zero unsafe, zero unwrap in production code
135 - Rust 2024 edition
136