max / docengine

7.0 KB · 115 lines History Blame Raw

1	# DocEngine — Code Review
2
3	Date: 2026-04-12
4	Version: 0.3.0
5	Reviewer: Claude (Opus 4.6)
6	Scope: Full codebase review — all Rust source, Cargo.toml, README, docs
7
8	## Summary
9
10	DocEngine is a markdown-to-HTML rendering library (~2,550 source LOC across 13 modules) built on pulldown-cmark and ammonia. Preset-based configuration system bundles markdown features with sanitization levels. 6 cargo feature gates keep the dependency tree minimal. Used by 5 consumers across the ecosystem (MNW, Multithreaded, GO, BB, AF). 141 tests, 0 clippy warnings, 0 unsafe code.
11
12	Overall: A — clean, well-tested, security-conscious. No bugs found. Findings are documentation gaps and minor observations.
13
14	---
15
16	## Findings
17
18	### [MEDIUM] README and architecture.md missing `directives` and `media-urls` features
19
20	The README feature flag table (lines 42-51) lists `doc-loader`, `frontmatter`, `mentions`, and `quotes` but omits `directives` and `media-urls`. These features are defined in Cargo.toml, included in `full`, and used by MNW. The `full` description says "all of the above" but the unlisted features make this misleading.
21
22	Similarly, `architecture.md` module map (lines 9-21) does not include `directives.rs`, `media_urls.rs`, or `escape.rs`. The consumers table also doesn't mention `directives` or `media-urls` for MNW.
23
24	### [MEDIUM] Missing `docs/todo.md` and `docs/audit_review.md`
25
26	Per cross-cutting conventions, each project should have `todo.md` and `audit_review.md` in `docs/`. Only `architecture.md` exists.
27
28	### [LOW] `Permissive` and `Standard` sanitize presets are identical
29
30	In `sanitize.rs:17`, both `Permissive` and `Standard` map to `ammonia::clean(html)`. The doc comment for `Standard` says "Same as Permissive." The distinction is intentional — they differ in the `Renderer`'s markdown settings (Standard strips images, Permissive doesn't) — but the sanitize preset enum having two identical variants with a comment that says "Same" could confuse contributors. A comment clarifying "Same sanitization; markdown-level differences are configured in the Renderer" would help.
31
32	### [LOW] `strip_html_tags` in doc_loader.rs does not decode HTML entities
33
34	The search index generator (`strip_html_tags`, lines 162-182) strips tags but leaves HTML entities (`&`, `<`, etc.) intact. Searching for "A & B" won't match content rendered as "A & B". Low impact since doc search is client-side and search terms are unlikely to contain entities, but worth noting.
35
36	### [LOW] `render.rs` at 511 lines
37
38	Technically exceeds the 500-line guideline, but 268 lines are tests. The logic is ~235 lines. Within the spirit of the rule. If the test suite grows further, consider moving tests to a submodule.
39
40	### [INFO] `rewrite_links` regex is naive about nested brackets
41
42	The regex `\[([^\]]+)\]\(([^)]+)\)` in doc_loader.rs cannot handle nested brackets in link text or parentheses in URLs. Low risk since doc files are authored by the project owner, but edge cases like `[text](url_(with_parens))` would be malformed.
43
44	### [INFO] `extract_title` silently fails on frontmatter-prefixed documents
45
46	If called on raw markdown that starts with `+++` TOML frontmatter, `extract_title` returns `None` because the `+++` line is neither empty nor `---`. This is the correct behavior (frontmatter should be stripped first via `parse_frontmatter`), but the interaction is documented nowhere.
47
48	### [INFO] `html_escape` uses sequential string replacements
49
50	Five sequential `.replace()` calls, each allocating a new String. A single-pass approach would be more efficient, but this function is only called in template contexts (TOC, quotes, video tags), not in the hot render path. Negligible impact.
51
52	---
53
54	## Strengths
55
56	- Preset system is the right abstraction. Bundles markdown features with matching sanitization levels, preventing dangerous misconfigurations (e.g., raw HTML without sanitization). Builder pattern still allows per-instance overrides.
57	- Feature gate design. Zero unnecessary dependencies for core-only consumers. Smart split of `regex` (doc-loader, complex patterns) vs `regex-lite` (directives/mentions/quotes/media-urls, simple patterns).
58	- Security-conscious. All paths go through ammonia sanitization. Dangerous URL schemes detected case-insensitively. Path traversal blocked in media URL rewriting. HTML escaping on all user-supplied strings interpolated into HTML. Zero unsafe code.
59	- Test quality. 141 tests at ~1.09:1 test-to-logic ratio. Tests cover happy paths, edge cases, security scenarios. All co-located with implementation.
60	- Clean module boundaries. Each module does one thing. No circular dependencies. Feature gates cleanly gate whole modules.
61	- Directive system is extensible. Any `[!UPPERCASE]` blockquote becomes an alert div. Code tabs auto-detect language labels. Both are implemented as HTML post-processing, keeping the core render pipeline simple.
62
63	## Security Checklist
64
65	\| Check \| Status \|
66	\|-------\|--------\|
67	\| XSS via raw HTML \| Pass — ammonia sanitization on all presets \|
68	\| XSS via markdown \| Pass — strict preset strips raw HTML at parser level + sanitizes \|
69	\| javascript:/data:/vbscript: URLs \| Pass — detected case-insensitively, neutralized to `#` \|
70	\| Path traversal in media URLs \| Pass — `..` paths rejected \|
71	\| User string injection in HTML \| Pass — `html_escape()` applied in quotes, TOC, media tags \|
72	\| Unsafe code \| Pass — zero `unsafe` blocks \|
73
74	## Metrics
75
76	\| Metric \| Value \|
77	\|--------\|-------\|
78	\| Source LOC (logic) \| ~1,310 \|
79	\| Source LOC (tests) \| ~1,235 \|
80	\| Source LOC (total) \| ~2,550 \|
81	\| Source files \| 13 \|
82	\| Test count \| 141 \|
83	\| Tests/KLOC (logic) \| ~108 \|
84	\| Clippy warnings \| 0 \|
85	\| Unsafe blocks \| 0 \|
86	\| Cargo features \| 6 (+full) \|
87	\| Direct dependencies \| 7 (3 always, 4 optional) \|
88	\| Consumers \| 5 (MNW, Multithreaded, GO, BB, AF) \|
89	\| Audit advisories \| 0 (1 allowed warning) \|
90
91	## Module Heatmap
92
93	\| Module \| Code \| Test \| Security \| Docs \|
94	\|--------\|:----:\|:----:\|:--------:\|:----:\|
95	\| render.rs \| A \| A \| A \| A \|
96	\| directives.rs \| A \| A \| A- \| B (not in README/arch) \|
97	\| doc_loader.rs \| A \| A- \| A \| A- \|
98	\| media_urls.rs \| A \| A \| A \| B (not in README/arch) \|
99	\| toc.rs \| A \| A- \| A \| A \|
100	\| mentions.rs \| A \| A \| A \| A \|
101	\| frontmatter.rs \| A \| A \| A \| A \|
102	\| code_spans.rs \| A \| A- \| A \| A- \|
103	\| sanitize.rs \| A- \| A- \| A \| B+ (confusing "Same" comment) \|
104	\| text.rs \| A \| A \| A \| A \|
105	\| escape.rs \| A \| A- \| A \| A \|
106	\| quotes.rs \| A \| A- \| A \| A \|
107	\| lib.rs \| A \| — \| A \| A \|
108
109	## Action Items
110
111	1. ~~[MEDIUM] Update README feature flag table to include `directives` and `media-urls`~~ — Done. Feature table, feature-gated API table, and consumers table all updated.
112	2. ~~[MEDIUM] Update `architecture.md` module map and consumers table~~ — Done. Added directives.rs, media_urls.rs, escape.rs to module map. Updated consumers, key paths, and added directive design decision.
113	3. ~~[LOW] Clarify `Standard` sanitize preset doc comment~~ — Done. Explains the difference is at the Renderer level.
114	4. ~~[LOW] Consider HTML entity decoding in `strip_html_tags` for search index accuracy~~ — Done. Decodes `&`, `<`, `>`, `"`, `'`, `'` after tag stripping.
115