| 1 |
# DocEngine Architecture |
| 2 |
|
| 3 |
## Overview |
| 4 |
|
| 5 |
DocEngine is a markdown rendering library that wraps pulldown-cmark (parsing) and ammonia (sanitization) behind a preset system. Each preset configures which markdown features are enabled and how aggressively the output is sanitized. |
| 6 |
|
| 7 |
## Module Map |
| 8 |
|
| 9 |
``` |
| 10 |
src/ |
| 11 |
lib.rs Crate root, re-exports, convenience functions |
| 12 |
render.rs Renderer struct (builder pattern, 4 presets, render/render_with_meta) |
| 13 |
sanitize.rs SanitizePreset enum (Permissive, Standard, Strict, Minimal) |
| 14 |
text.rs Text utilities (word_count, reading_time, extract_title, strip_first_heading) |
| 15 |
toc.rs Table of contents extraction and HTML rendering |
| 16 |
escape.rs HTML entity escaping for safe string interpolation |
| 17 |
code_spans.rs Code span/block byte range detection (used by mentions to skip code) |
| 18 |
directives.rs [directives] Alert/tabs blockquote post-processing |
| 19 |
doc_loader.rs [doc-loader] Load .md files from disk into in-memory page store |
| 20 |
frontmatter.rs [frontmatter] Parse +++delimited TOML frontmatter |
| 21 |
media_urls.rs [media-urls] CDN path rewriting for images, img-to-video conversion |
| 22 |
mentions.rs [mentions] @username extraction and resolution |
| 23 |
quotes.rs [quotes] [quote:UUID:HASH] post-processing for forum attribution |
| 24 |
``` |
| 25 |
|
| 26 |
## Design Decisions |
| 27 |
|
| 28 |
### Presets over configuration |
| 29 |
|
| 30 |
Rather than exposing every pulldown-cmark option, DocEngine provides named presets that bundle markdown features with sanitization levels. This prevents misconfiguration -- you can't accidentally enable raw HTML without appropriate sanitization. |
| 31 |
|
| 32 |
Custom configurations are still possible via the builder pattern (`Renderer::permissive().with_strip_images(true)`). |
| 33 |
|
| 34 |
### Two-phase rendering |
| 35 |
|
| 36 |
Rendering happens in two phases: |
| 37 |
1. **pulldown-cmark** parses markdown to HTML events, with optional filtering (strip images, strip raw HTML, neutralize dangerous URL schemes) |
| 38 |
2. **ammonia** sanitizes the resulting HTML string |
| 39 |
|
| 40 |
This means even the permissive preset strips `<script>` tags -- ammonia always runs. |
| 41 |
|
| 42 |
Post-processing steps (directives, mentions, quotes, media URLs) are applied after sanitization by consumers, not built into the render pipeline. |
| 43 |
|
| 44 |
### Feature-gated modules |
| 45 |
|
| 46 |
DocEngine has zero required dependencies beyond pulldown-cmark, ammonia, and serde. Consumers that only need rendering don't pull in regex, toml, or uuid. The `full` feature enables everything. |
| 47 |
|
| 48 |
The `regex` vs `regex-lite` split is intentional -- doc-loader's link rewriting needs the full regex engine while simpler patterns in directives, mentions, quotes, and media-urls use the lighter variant. |
| 49 |
|
| 50 |
### DocLoader loads once at startup |
| 51 |
|
| 52 |
`DocLoader::load()` reads all `.md` files from disk, renders them to HTML, and stores them in a `HashMap<String, DocPage>`. This happens once at application boot (MNW calls it during startup). Pages are served from memory with no disk I/O on request. |
| 53 |
|
| 54 |
Link rewriting converts relative `.md` references to the configured URL prefix (e.g., `./faq.md` becomes `/docs/faq`). Links to unpublished docs are stripped to plain text. |
| 55 |
|
| 56 |
### Mention resolution skips code |
| 57 |
|
| 58 |
`extract_mentions` and `resolve_mentions` detect inline code (backticks) and fenced code blocks, skipping any @mentions inside them. This prevents false positives from code examples. |
| 59 |
|
| 60 |
### Directive post-processing |
| 61 |
|
| 62 |
Directives (`[!NOTE]`, `[!TIP]`, `[!TABS]`, etc.) are implemented as HTML post-processing rather than markdown parsing extensions. This keeps the core render pipeline simple and makes directives composable with any preset. |
| 63 |
|
| 64 |
## Consumers |
| 65 |
|
| 66 |
|
| 67 |
|
| 68 |
| MNW | doc-loader, directives, frontmatter, media-urls | Site docs loaded at boot, blog posts with frontmatter, user descriptions (standard), item markdown (standard), CDN image rewriting | |
| 69 |
| Multithreaded | mentions, quotes | Forum posts (strict), @username linking, quote attribution | |
| 70 |
| GoingsOn | core | Task/event descriptions (standard) | |
| 71 |
| Balanced Breakfast | core | RSS feed content (sanitize_only) | |
| 72 |
| audiofiles | core | Sample descriptions (standard) | |
| 73 |
|
| 74 |
## Key Paths |
| 75 |
|
| 76 |
- `src/render.rs` -- the core rendering logic |
| 77 |
- `src/sanitize.rs` -- ammonia preset configurations |
| 78 |
- `src/directives.rs` -- alert and code tab processing |
| 79 |
- `src/doc_loader.rs` -- document loading and link rewriting |
| 80 |
- `src/media_urls.rs` -- CDN path rewriting |
| 81 |
|