Skip to main content

max / docengine

7.0 KB · 115 lines History Blame Raw
1 # DocEngine — Code Review
2
3 **Date:** 2026-04-12
4 **Version:** 0.3.0
5 **Reviewer:** Claude (Opus 4.6)
6 **Scope:** Full codebase review — all Rust source, Cargo.toml, README, docs
7
8 ## Summary
9
10 DocEngine is a markdown-to-HTML rendering library (~2,550 source LOC across 13 modules) built on pulldown-cmark and ammonia. Preset-based configuration system bundles markdown features with sanitization levels. 6 cargo feature gates keep the dependency tree minimal. Used by 5 consumers across the ecosystem (MNW, Multithreaded, GO, BB, AF). 141 tests, 0 clippy warnings, 0 unsafe code.
11
12 **Overall: A** — clean, well-tested, security-conscious. No bugs found. Findings are documentation gaps and minor observations.
13
14 ---
15
16 ## Findings
17
18 ### [MEDIUM] README and architecture.md missing `directives` and `media-urls` features
19
20 The README feature flag table (lines 42-51) lists `doc-loader`, `frontmatter`, `mentions`, and `quotes` but omits `directives` and `media-urls`. These features are defined in Cargo.toml, included in `full`, and used by MNW. The `full` description says "all of the above" but the unlisted features make this misleading.
21
22 Similarly, `architecture.md` module map (lines 9-21) does not include `directives.rs`, `media_urls.rs`, or `escape.rs`. The consumers table also doesn't mention `directives` or `media-urls` for MNW.
23
24 ### [MEDIUM] Missing `docs/todo.md` and `docs/audit_review.md`
25
26 Per cross-cutting conventions, each project should have `todo.md` and `audit_review.md` in `docs/`. Only `architecture.md` exists.
27
28 ### [LOW] `Permissive` and `Standard` sanitize presets are identical
29
30 In `sanitize.rs:17`, both `Permissive` and `Standard` map to `ammonia::clean(html)`. The doc comment for `Standard` says "Same as Permissive." The distinction is intentional — they differ in the `Renderer`'s markdown settings (Standard strips images, Permissive doesn't) — but the sanitize preset enum having two identical variants with a comment that says "Same" could confuse contributors. A comment clarifying "Same sanitization; markdown-level differences are configured in the Renderer" would help.
31
32 ### [LOW] `strip_html_tags` in doc_loader.rs does not decode HTML entities
33
34 The search index generator (`strip_html_tags`, lines 162-182) strips tags but leaves HTML entities (`&`, `<`, etc.) intact. Searching for "A & B" won't match content rendered as "A & B". Low impact since doc search is client-side and search terms are unlikely to contain entities, but worth noting.
35
36 ### [LOW] `render.rs` at 511 lines
37
38 Technically exceeds the 500-line guideline, but 268 lines are tests. The logic is ~235 lines. Within the spirit of the rule. If the test suite grows further, consider moving tests to a submodule.
39
40 ### [INFO] `rewrite_links` regex is naive about nested brackets
41
42 The regex `\[([^\]]+)\]\(([^)]+)\)` in doc_loader.rs cannot handle nested brackets in link text or parentheses in URLs. Low risk since doc files are authored by the project owner, but edge cases like `[text](url_(with_parens))` would be malformed.
43
44 ### [INFO] `extract_title` silently fails on frontmatter-prefixed documents
45
46 If called on raw markdown that starts with `+++` TOML frontmatter, `extract_title` returns `None` because the `+++` line is neither empty nor `---`. This is the correct behavior (frontmatter should be stripped first via `parse_frontmatter`), but the interaction is documented nowhere.
47
48 ### [INFO] `html_escape` uses sequential string replacements
49
50 Five sequential `.replace()` calls, each allocating a new String. A single-pass approach would be more efficient, but this function is only called in template contexts (TOC, quotes, video tags), not in the hot render path. Negligible impact.
51
52 ---
53
54 ## Strengths
55
56 - **Preset system is the right abstraction.** Bundles markdown features with matching sanitization levels, preventing dangerous misconfigurations (e.g., raw HTML without sanitization). Builder pattern still allows per-instance overrides.
57 - **Feature gate design.** Zero unnecessary dependencies for core-only consumers. Smart split of `regex` (doc-loader, complex patterns) vs `regex-lite` (directives/mentions/quotes/media-urls, simple patterns).
58 - **Security-conscious.** All paths go through ammonia sanitization. Dangerous URL schemes detected case-insensitively. Path traversal blocked in media URL rewriting. HTML escaping on all user-supplied strings interpolated into HTML. Zero unsafe code.
59 - **Test quality.** 141 tests at ~1.09:1 test-to-logic ratio. Tests cover happy paths, edge cases, security scenarios. All co-located with implementation.
60 - **Clean module boundaries.** Each module does one thing. No circular dependencies. Feature gates cleanly gate whole modules.
61 - **Directive system is extensible.** Any `[!UPPERCASE]` blockquote becomes an alert div. Code tabs auto-detect language labels. Both are implemented as HTML post-processing, keeping the core render pipeline simple.
62
63 ## Security Checklist
64
65 | Check | Status |
66 |-------|--------|
67 | XSS via raw HTML | Pass — ammonia sanitization on all presets |
68 | XSS via markdown | Pass — strict preset strips raw HTML at parser level + sanitizes |
69 | javascript:/data:/vbscript: URLs | Pass — detected case-insensitively, neutralized to `#` |
70 | Path traversal in media URLs | Pass — `..` paths rejected |
71 | User string injection in HTML | Pass — `html_escape()` applied in quotes, TOC, media tags |
72 | Unsafe code | Pass — zero `unsafe` blocks |
73
74 ## Metrics
75
76 | Metric | Value |
77 |--------|-------|
78 | Source LOC (logic) | ~1,310 |
79 | Source LOC (tests) | ~1,235 |
80 | Source LOC (total) | ~2,550 |
81 | Source files | 13 |
82 | Test count | 141 |
83 | Tests/KLOC (logic) | ~108 |
84 | Clippy warnings | 0 |
85 | Unsafe blocks | 0 |
86 | Cargo features | 6 (+full) |
87 | Direct dependencies | 7 (3 always, 4 optional) |
88 | Consumers | 5 (MNW, Multithreaded, GO, BB, AF) |
89 | Audit advisories | 0 (1 allowed warning) |
90
91 ## Module Heatmap
92
93 | Module | Code | Test | Security | Docs |
94 |--------|:----:|:----:|:--------:|:----:|
95 | render.rs | A | A | A | A |
96 | directives.rs | A | A | A- | B (not in README/arch) |
97 | doc_loader.rs | A | A- | A | A- |
98 | media_urls.rs | A | A | A | B (not in README/arch) |
99 | toc.rs | A | A- | A | A |
100 | mentions.rs | A | A | A | A |
101 | frontmatter.rs | A | A | A | A |
102 | code_spans.rs | A | A- | A | A- |
103 | sanitize.rs | A- | A- | A | B+ (confusing "Same" comment) |
104 | text.rs | A | A | A | A |
105 | escape.rs | A | A- | A | A |
106 | quotes.rs | A | A- | A | A |
107 | lib.rs | A || A | A |
108
109 ## Action Items
110
111 1. ~~**[MEDIUM]** Update README feature flag table to include `directives` and `media-urls`~~ — Done. Feature table, feature-gated API table, and consumers table all updated.
112 2. ~~**[MEDIUM]** Update `architecture.md` module map and consumers table~~ — Done. Added directives.rs, media_urls.rs, escape.rs to module map. Updated consumers, key paths, and added directive design decision.
113 3. ~~**[LOW]** Clarify `Standard` sanitize preset doc comment~~ — Done. Explains the difference is at the Renderer level.
114 4. ~~**[LOW]** Consider HTML entity decoding in `strip_html_tags` for search index accuracy~~ — Done. Decodes `&`, `<`, `>`, `"`, `'`, `'` after tag stripping.
115