# DocEngine — Code Review

**Date:** 2026-04-12
**Version:** 0.3.0
**Reviewer:** Claude (Opus 4.6)
**Scope:** Full codebase review — all Rust source, Cargo.toml, README, docs

## Summary

DocEngine is a markdown-to-HTML rendering library (~2,550 source LOC across 13 modules) built on pulldown-cmark and ammonia. Preset-based configuration system bundles markdown features with sanitization levels. 6 cargo feature gates keep the dependency tree minimal. Used by 5 consumers across the ecosystem (MNW, Multithreaded, GO, BB, AF). 141 tests, 0 clippy warnings, 0 unsafe code.

**Overall: A** — clean, well-tested, security-conscious. No bugs found. Findings are documentation gaps and minor observations.

---

## Findings

### [MEDIUM] README and architecture.md missing `directives` and `media-urls` features

The README feature flag table (lines 42-51) lists `doc-loader`, `frontmatter`, `mentions`, and `quotes` but omits `directives` and `media-urls`. These features are defined in Cargo.toml, included in `full`, and used by MNW. The `full` description says "all of the above" but the unlisted features make this misleading.

Similarly, `architecture.md` module map (lines 9-21) does not include `directives.rs`, `media_urls.rs`, or `escape.rs`. The consumers table also doesn't mention `directives` or `media-urls` for MNW.

### [MEDIUM] Missing `docs/todo.md` and `docs/audit_review.md`

Per cross-cutting conventions, each project should have `todo.md` and `audit_review.md` in `docs/`. Only `architecture.md` exists.

### [LOW] `Permissive` and `Standard` sanitize presets are identical

In `sanitize.rs:17`, both `Permissive` and `Standard` map to `ammonia::clean(html)`. The doc comment for `Standard` says "Same as Permissive." The distinction is intentional — they differ in the `Renderer`'s markdown settings (Standard strips images, Permissive doesn't) — but the sanitize preset enum having two identical variants with a comment that says "Same" could confuse contributors. A comment clarifying "Same sanitization; markdown-level differences are configured in the Renderer" would help.

### [LOW] `strip_html_tags` in doc_loader.rs does not decode HTML entities

The search index generator (`strip_html_tags`, lines 162-182) strips tags but leaves HTML entities (`&amp;`, `&lt;`, etc.) intact. Searching for "A & B" won't match content rendered as "A &amp; B". Low impact since doc search is client-side and search terms are unlikely to contain entities, but worth noting.

### [LOW] `render.rs` at 511 lines

Technically exceeds the 500-line guideline, but 268 lines are tests. The logic is ~235 lines. Within the spirit of the rule. If the test suite grows further, consider moving tests to a submodule.

### [INFO] `rewrite_links` regex is naive about nested brackets

The regex `\[([^\]]+)\]\(([^)]+)\)` in doc_loader.rs cannot handle nested brackets in link text or parentheses in URLs. Low risk since doc files are authored by the project owner, but edge cases like `[text](url_(with_parens))` would be malformed.

### [INFO] `extract_title` silently fails on frontmatter-prefixed documents

If called on raw markdown that starts with `+++` TOML frontmatter, `extract_title` returns `None` because the `+++` line is neither empty nor `---`. This is the correct behavior (frontmatter should be stripped first via `parse_frontmatter`), but the interaction is documented nowhere.

### [INFO] `html_escape` uses sequential string replacements

Five sequential `.replace()` calls, each allocating a new String. A single-pass approach would be more efficient, but this function is only called in template contexts (TOC, quotes, video tags), not in the hot render path. Negligible impact.

---

## Strengths

- **Preset system is the right abstraction.** Bundles markdown features with matching sanitization levels, preventing dangerous misconfigurations (e.g., raw HTML without sanitization). Builder pattern still allows per-instance overrides.
- **Feature gate design.** Zero unnecessary dependencies for core-only consumers. Smart split of `regex` (doc-loader, complex patterns) vs `regex-lite` (directives/mentions/quotes/media-urls, simple patterns).
- **Security-conscious.** All paths go through ammonia sanitization. Dangerous URL schemes detected case-insensitively. Path traversal blocked in media URL rewriting. HTML escaping on all user-supplied strings interpolated into HTML. Zero unsafe code.
- **Test quality.** 141 tests at ~1.09:1 test-to-logic ratio. Tests cover happy paths, edge cases, security scenarios. All co-located with implementation.
- **Clean module boundaries.** Each module does one thing. No circular dependencies. Feature gates cleanly gate whole modules.
- **Directive system is extensible.** Any `[!UPPERCASE]` blockquote becomes an alert div. Code tabs auto-detect language labels. Both are implemented as HTML post-processing, keeping the core render pipeline simple.

## Security Checklist

| Check | Status |
|-------|--------|
| XSS via raw HTML | Pass — ammonia sanitization on all presets |
| XSS via markdown | Pass — strict preset strips raw HTML at parser level + sanitizes |
| javascript:/data:/vbscript: URLs | Pass — detected case-insensitively, neutralized to `#` |
| Path traversal in media URLs | Pass — `..` paths rejected |
| User string injection in HTML | Pass — `html_escape()` applied in quotes, TOC, media tags |
| Unsafe code | Pass — zero `unsafe` blocks |

## Metrics

| Metric | Value |
|--------|-------|
| Source LOC (logic) | ~1,310 |
| Source LOC (tests) | ~1,235 |
| Source LOC (total) | ~2,550 |
| Source files | 13 |
| Test count | 141 |
| Tests/KLOC (logic) | ~108 |
| Clippy warnings | 0 |
| Unsafe blocks | 0 |
| Cargo features | 6 (+full) |
| Direct dependencies | 7 (3 always, 4 optional) |
| Consumers | 5 (MNW, Multithreaded, GO, BB, AF) |
| Audit advisories | 0 (1 allowed warning) |

## Module Heatmap

| Module | Code | Test | Security | Docs |
|--------|:----:|:----:|:--------:|:----:|
| render.rs | A | A | A | A |
| directives.rs | A | A | A- | B (not in README/arch) |
| doc_loader.rs | A | A- | A | A- |
| media_urls.rs | A | A | A | B (not in README/arch) |
| toc.rs | A | A- | A | A |
| mentions.rs | A | A | A | A |
| frontmatter.rs | A | A | A | A |
| code_spans.rs | A | A- | A | A- |
| sanitize.rs | A- | A- | A | B+ (confusing "Same" comment) |
| text.rs | A | A | A | A |
| escape.rs | A | A- | A | A |
| quotes.rs | A | A- | A | A |
| lib.rs | A | — | A | A |

## Action Items

1. ~~**[MEDIUM]** Update README feature flag table to include `directives` and `media-urls`~~ — Done. Feature table, feature-gated API table, and consumers table all updated.
2. ~~**[MEDIUM]** Update `architecture.md` module map and consumers table~~ — Done. Added directives.rs, media_urls.rs, escape.rs to module map. Updated consumers, key paths, and added directive design decision.
3. ~~**[LOW]** Clarify `Standard` sanitize preset doc comment~~ — Done. Explains the difference is at the Renderer level.
4. ~~**[LOW]** Consider HTML entity decoding in `strip_html_tags` for search index accuracy~~ — Done. Decodes `&amp;`, `&lt;`, `&gt;`, `&quot;`, `&#x27;`, `&#39;` after tag stripping.