# DocEngine — Code Review **Date:** 2026-04-12 **Version:** 0.3.0 **Reviewer:** Claude (Opus 4.6) **Scope:** Full codebase review — all Rust source, Cargo.toml, README, docs ## Summary DocEngine is a markdown-to-HTML rendering library (~2,550 source LOC across 13 modules) built on pulldown-cmark and ammonia. Preset-based configuration system bundles markdown features with sanitization levels. 6 cargo feature gates keep the dependency tree minimal. Used by 5 consumers across the ecosystem (MNW, Multithreaded, GO, BB, AF). 141 tests, 0 clippy warnings, 0 unsafe code. **Overall: A** — clean, well-tested, security-conscious. No bugs found. Findings are documentation gaps and minor observations. --- ## Findings ### [MEDIUM] README and architecture.md missing `directives` and `media-urls` features The README feature flag table (lines 42-51) lists `doc-loader`, `frontmatter`, `mentions`, and `quotes` but omits `directives` and `media-urls`. These features are defined in Cargo.toml, included in `full`, and used by MNW. The `full` description says "all of the above" but the unlisted features make this misleading. Similarly, `architecture.md` module map (lines 9-21) does not include `directives.rs`, `media_urls.rs`, or `escape.rs`. The consumers table also doesn't mention `directives` or `media-urls` for MNW. ### [MEDIUM] Missing `docs/todo.md` and `docs/audit_review.md` Per cross-cutting conventions, each project should have `todo.md` and `audit_review.md` in `docs/`. Only `architecture.md` exists. ### [LOW] `Permissive` and `Standard` sanitize presets are identical In `sanitize.rs:17`, both `Permissive` and `Standard` map to `ammonia::clean(html)`. The doc comment for `Standard` says "Same as Permissive." The distinction is intentional — they differ in the `Renderer`'s markdown settings (Standard strips images, Permissive doesn't) — but the sanitize preset enum having two identical variants with a comment that says "Same" could confuse contributors. A comment clarifying "Same sanitization; markdown-level differences are configured in the Renderer" would help. ### [LOW] `strip_html_tags` in doc_loader.rs does not decode HTML entities The search index generator (`strip_html_tags`, lines 162-182) strips tags but leaves HTML entities (`&`, `<`, etc.) intact. Searching for "A & B" won't match content rendered as "A & B". Low impact since doc search is client-side and search terms are unlikely to contain entities, but worth noting. ### [LOW] `render.rs` at 511 lines Technically exceeds the 500-line guideline, but 268 lines are tests. The logic is ~235 lines. Within the spirit of the rule. If the test suite grows further, consider moving tests to a submodule. ### [INFO] `rewrite_links` regex is naive about nested brackets The regex `\[([^\]]+)\]\(([^)]+)\)` in doc_loader.rs cannot handle nested brackets in link text or parentheses in URLs. Low risk since doc files are authored by the project owner, but edge cases like `[text](url_(with_parens))` would be malformed. ### [INFO] `extract_title` silently fails on frontmatter-prefixed documents If called on raw markdown that starts with `+++` TOML frontmatter, `extract_title` returns `None` because the `+++` line is neither empty nor `---`. This is the correct behavior (frontmatter should be stripped first via `parse_frontmatter`), but the interaction is documented nowhere. ### [INFO] `html_escape` uses sequential string replacements Five sequential `.replace()` calls, each allocating a new String. A single-pass approach would be more efficient, but this function is only called in template contexts (TOC, quotes, video tags), not in the hot render path. Negligible impact. --- ## Strengths - **Preset system is the right abstraction.** Bundles markdown features with matching sanitization levels, preventing dangerous misconfigurations (e.g., raw HTML without sanitization). Builder pattern still allows per-instance overrides. - **Feature gate design.** Zero unnecessary dependencies for core-only consumers. Smart split of `regex` (doc-loader, complex patterns) vs `regex-lite` (directives/mentions/quotes/media-urls, simple patterns). - **Security-conscious.** All paths go through ammonia sanitization. Dangerous URL schemes detected case-insensitively. Path traversal blocked in media URL rewriting. HTML escaping on all user-supplied strings interpolated into HTML. Zero unsafe code. - **Test quality.** 141 tests at ~1.09:1 test-to-logic ratio. Tests cover happy paths, edge cases, security scenarios. All co-located with implementation. - **Clean module boundaries.** Each module does one thing. No circular dependencies. Feature gates cleanly gate whole modules. - **Directive system is extensible.** Any `[!UPPERCASE]` blockquote becomes an alert div. Code tabs auto-detect language labels. Both are implemented as HTML post-processing, keeping the core render pipeline simple. ## Security Checklist | Check | Status | |-------|--------| | XSS via raw HTML | Pass — ammonia sanitization on all presets | | XSS via markdown | Pass — strict preset strips raw HTML at parser level + sanitizes | | javascript:/data:/vbscript: URLs | Pass — detected case-insensitively, neutralized to `#` | | Path traversal in media URLs | Pass — `..` paths rejected | | User string injection in HTML | Pass — `html_escape()` applied in quotes, TOC, media tags | | Unsafe code | Pass — zero `unsafe` blocks | ## Metrics | Metric | Value | |--------|-------| | Source LOC (logic) | ~1,310 | | Source LOC (tests) | ~1,235 | | Source LOC (total) | ~2,550 | | Source files | 13 | | Test count | 141 | | Tests/KLOC (logic) | ~108 | | Clippy warnings | 0 | | Unsafe blocks | 0 | | Cargo features | 6 (+full) | | Direct dependencies | 7 (3 always, 4 optional) | | Consumers | 5 (MNW, Multithreaded, GO, BB, AF) | | Audit advisories | 0 (1 allowed warning) | ## Module Heatmap | Module | Code | Test | Security | Docs | |--------|:----:|:----:|:--------:|:----:| | render.rs | A | A | A | A | | directives.rs | A | A | A- | B (not in README/arch) | | doc_loader.rs | A | A- | A | A- | | media_urls.rs | A | A | A | B (not in README/arch) | | toc.rs | A | A- | A | A | | mentions.rs | A | A | A | A | | frontmatter.rs | A | A | A | A | | code_spans.rs | A | A- | A | A- | | sanitize.rs | A- | A- | A | B+ (confusing "Same" comment) | | text.rs | A | A | A | A | | escape.rs | A | A- | A | A | | quotes.rs | A | A- | A | A | | lib.rs | A | — | A | A | ## Action Items 1. ~~**[MEDIUM]** Update README feature flag table to include `directives` and `media-urls`~~ — Done. Feature table, feature-gated API table, and consumers table all updated. 2. ~~**[MEDIUM]** Update `architecture.md` module map and consumers table~~ — Done. Added directives.rs, media_urls.rs, escape.rs to module map. Updated consumers, key paths, and added directive design decision. 3. ~~**[LOW]** Clarify `Standard` sanitize preset doc comment~~ — Done. Explains the difference is at the Renderer level. 4. ~~**[LOW]** Consider HTML entity decoding in `strip_html_tags` for search index accuracy~~ — Done. Decodes `&`, `<`, `>`, `"`, `'`, `'` after tag stripping.