| 1 |
# DocEngine — Code Review |
| 2 |
|
| 3 |
**Date:** 2026-04-12 |
| 4 |
**Version:** 0.3.0 |
| 5 |
**Reviewer:** Claude (Opus 4.6) |
| 6 |
**Scope:** Full codebase review — all Rust source, Cargo.toml, README, docs |
| 7 |
|
| 8 |
## Summary |
| 9 |
|
| 10 |
DocEngine is a markdown-to-HTML rendering library (~2,550 source LOC across 13 modules) built on pulldown-cmark and ammonia. Preset-based configuration system bundles markdown features with sanitization levels. 6 cargo feature gates keep the dependency tree minimal. Used by 5 consumers across the ecosystem (MNW, Multithreaded, GO, BB, AF). 141 tests, 0 clippy warnings, 0 unsafe code. |
| 11 |
|
| 12 |
**Overall: A** — clean, well-tested, security-conscious. No bugs found. Findings are documentation gaps and minor observations. |
| 13 |
|
| 14 |
--- |
| 15 |
|
| 16 |
## Findings |
| 17 |
|
| 18 |
### [MEDIUM] README and architecture.md missing `directives` and `media-urls` features |
| 19 |
|
| 20 |
The README feature flag table (lines 42-51) lists `doc-loader`, `frontmatter`, `mentions`, and `quotes` but omits `directives` and `media-urls`. These features are defined in Cargo.toml, included in `full`, and used by MNW. The `full` description says "all of the above" but the unlisted features make this misleading. |
| 21 |
|
| 22 |
Similarly, `architecture.md` module map (lines 9-21) does not include `directives.rs`, `media_urls.rs`, or `escape.rs`. The consumers table also doesn't mention `directives` or `media-urls` for MNW. |
| 23 |
|
| 24 |
### [MEDIUM] Missing `docs/todo.md` and `docs/audit_review.md` |
| 25 |
|
| 26 |
Per cross-cutting conventions, each project should have `todo.md` and `audit_review.md` in `docs/`. Only `architecture.md` exists. |
| 27 |
|
| 28 |
### [LOW] `Permissive` and `Standard` sanitize presets are identical |
| 29 |
|
| 30 |
In `sanitize.rs:17`, both `Permissive` and `Standard` map to `ammonia::clean(html)`. The doc comment for `Standard` says "Same as Permissive." The distinction is intentional — they differ in the `Renderer`'s markdown settings (Standard strips images, Permissive doesn't) — but the sanitize preset enum having two identical variants with a comment that says "Same" could confuse contributors. A comment clarifying "Same sanitization; markdown-level differences are configured in the Renderer" would help. |
| 31 |
|
| 32 |
### [LOW] `strip_html_tags` in doc_loader.rs does not decode HTML entities |
| 33 |
|
| 34 |
The search index generator (`strip_html_tags`, lines 162-182) strips tags but leaves HTML entities (`&`, `<`, etc.) intact. Searching for "A & B" won't match content rendered as "A & B". Low impact since doc search is client-side and search terms are unlikely to contain entities, but worth noting. |
| 35 |
|
| 36 |
### [LOW] `render.rs` at 511 lines |
| 37 |
|
| 38 |
Technically exceeds the 500-line guideline, but 268 lines are tests. The logic is ~235 lines. Within the spirit of the rule. If the test suite grows further, consider moving tests to a submodule. |
| 39 |
|
| 40 |
### [INFO] `rewrite_links` regex is naive about nested brackets |
| 41 |
|
| 42 |
The regex `\[([^\]]+)\]\(([^)]+)\)` in doc_loader.rs cannot handle nested brackets in link text or parentheses in URLs. Low risk since doc files are authored by the project owner, but edge cases like `[text](url_(with_parens))` would be malformed. |
| 43 |
|
| 44 |
### [INFO] `extract_title` silently fails on frontmatter-prefixed documents |
| 45 |
|
| 46 |
If called on raw markdown that starts with `+++` TOML frontmatter, `extract_title` returns `None` because the `+++` line is neither empty nor `---`. This is the correct behavior (frontmatter should be stripped first via `parse_frontmatter`), but the interaction is documented nowhere. |
| 47 |
|
| 48 |
### [INFO] `html_escape` uses sequential string replacements |
| 49 |
|
| 50 |
Five sequential `.replace()` calls, each allocating a new String. A single-pass approach would be more efficient, but this function is only called in template contexts (TOC, quotes, video tags), not in the hot render path. Negligible impact. |
| 51 |
|
| 52 |
--- |
| 53 |
|
| 54 |
## Strengths |
| 55 |
|
| 56 |
- **Preset system is the right abstraction.** Bundles markdown features with matching sanitization levels, preventing dangerous misconfigurations (e.g., raw HTML without sanitization). Builder pattern still allows per-instance overrides. |
| 57 |
- **Feature gate design.** Zero unnecessary dependencies for core-only consumers. Smart split of `regex` (doc-loader, complex patterns) vs `regex-lite` (directives/mentions/quotes/media-urls, simple patterns). |
| 58 |
- **Security-conscious.** All paths go through ammonia sanitization. Dangerous URL schemes detected case-insensitively. Path traversal blocked in media URL rewriting. HTML escaping on all user-supplied strings interpolated into HTML. Zero unsafe code. |
| 59 |
- **Test quality.** 141 tests at ~1.09:1 test-to-logic ratio. Tests cover happy paths, edge cases, security scenarios. All co-located with implementation. |
| 60 |
- **Clean module boundaries.** Each module does one thing. No circular dependencies. Feature gates cleanly gate whole modules. |
| 61 |
- **Directive system is extensible.** Any `[!UPPERCASE]` blockquote becomes an alert div. Code tabs auto-detect language labels. Both are implemented as HTML post-processing, keeping the core render pipeline simple. |
| 62 |
|
| 63 |
## Security Checklist |
| 64 |
|
| 65 |
|
| 66 |
|
| 67 |
| XSS via raw HTML | Pass — ammonia sanitization on all presets | |
| 68 |
| XSS via markdown | Pass — strict preset strips raw HTML at parser level + sanitizes | |
| 69 |
| javascript:/data:/vbscript: URLs | Pass — detected case-insensitively, neutralized to `#` | |
| 70 |
| Path traversal in media URLs | Pass — `..` paths rejected | |
| 71 |
| User string injection in HTML | Pass — `html_escape()` applied in quotes, TOC, media tags | |
| 72 |
| Unsafe code | Pass — zero `unsafe` blocks | |
| 73 |
|
| 74 |
## Metrics |
| 75 |
|
| 76 |
|
| 77 |
|
| 78 |
| Source LOC (logic) | ~1,310 | |
| 79 |
| Source LOC (tests) | ~1,235 | |
| 80 |
| Source LOC (total) | ~2,550 | |
| 81 |
| Source files | 13 | |
| 82 |
| Test count | 141 | |
| 83 |
| Tests/KLOC (logic) | ~108 | |
| 84 |
| Clippy warnings | 0 | |
| 85 |
| Unsafe blocks | 0 | |
| 86 |
| Cargo features | 6 (+full) | |
| 87 |
| Direct dependencies | 7 (3 always, 4 optional) | |
| 88 |
| Consumers | 5 (MNW, Multithreaded, GO, BB, AF) | |
| 89 |
| Audit advisories | 0 (1 allowed warning) | |
| 90 |
|
| 91 |
## Module Heatmap |
| 92 |
|
| 93 |
|
| 94 |
|
| 95 |
| render.rs | A | A | A | A | |
| 96 |
| directives.rs | A | A | A- | B (not in README/arch) | |
| 97 |
| doc_loader.rs | A | A- | A | A- | |
| 98 |
| media_urls.rs | A | A | A | B (not in README/arch) | |
| 99 |
| toc.rs | A | A- | A | A | |
| 100 |
| mentions.rs | A | A | A | A | |
| 101 |
| frontmatter.rs | A | A | A | A | |
| 102 |
| code_spans.rs | A | A- | A | A- | |
| 103 |
| sanitize.rs | A- | A- | A | B+ (confusing "Same" comment) | |
| 104 |
| text.rs | A | A | A | A | |
| 105 |
| escape.rs | A | A- | A | A | |
| 106 |
| quotes.rs | A | A- | A | A | |
| 107 |
| lib.rs | A | — | A | A | |
| 108 |
|
| 109 |
## Action Items |
| 110 |
|
| 111 |
1. ~~**[MEDIUM]** Update README feature flag table to include `directives` and `media-urls`~~ — Done. Feature table, feature-gated API table, and consumers table all updated. |
| 112 |
2. ~~**[MEDIUM]** Update `architecture.md` module map and consumers table~~ — Done. Added directives.rs, media_urls.rs, escape.rs to module map. Updated consumers, key paths, and added directive design decision. |
| 113 |
3. ~~**[LOW]** Clarify `Standard` sanitize preset doc comment~~ — Done. Explains the difference is at the Renderer level. |
| 114 |
4. ~~**[LOW]** Consider HTML entity decoding in `strip_html_tags` for search index accuracy~~ — Done. Decodes `&`, `<`, `>`, `"`, `'`, `'` after tag stripping. |
| 115 |
|