| 1 |
# DocEngine Architecture |
| 2 |
|
| 3 |
## Overview |
| 4 |
|
| 5 |
DocEngine is a markdown rendering library that wraps pulldown-cmark (parsing) and ammonia (sanitization) behind a preset system. Each preset configures which markdown features are enabled and how aggressively the output is sanitized. |
| 6 |
|
| 7 |
## Module Map |
| 8 |
|
| 9 |
``` |
| 10 |
src/ |
| 11 |
lib.rs Crate root, re-exports, convenience functions |
| 12 |
render.rs Renderer struct (builder pattern, 4 presets, render/render_with_meta) |
| 13 |
sanitize.rs SanitizePreset enum (Permissive, Standard, Strict, Minimal) |
| 14 |
text.rs Text utilities (word_count, reading_time, extract_title, strip_first_heading) |
| 15 |
toc.rs Table of contents extraction and HTML rendering |
| 16 |
escape.rs HTML entity escaping for safe string interpolation |
| 17 |
code_spans.rs Code span/block byte range detection (used by mentions to skip code) |
| 18 |
directives.rs [directives] Alert/tabs blockquote post-processing |
| 19 |
doc_loader.rs [doc-loader] Load .md files from disk into in-memory page store |
| 20 |
frontmatter.rs [frontmatter] Parse +++delimited TOML frontmatter |
| 21 |
media_urls.rs [media-urls] CDN path rewriting for images, img-to-video conversion |
| 22 |
mentions.rs [mentions] @username extraction and resolution |
| 23 |
quotes.rs [quotes] [quote:UUID:HASH] post-processing for forum attribution |
| 24 |
assumptions.rs [assumptions] TOML source-of-truth loader, derived registry, {{ … }} substitution |
| 25 |
filters.rs [assumptions] Filter trait, built-in filters, and the mini-parser for {{ path | filter(args) }} expressions |
| 26 |
``` |
| 27 |
|
| 28 |
## Design Decisions |
| 29 |
|
| 30 |
### Presets over configuration |
| 31 |
|
| 32 |
Rather than exposing every pulldown-cmark option, DocEngine provides named presets that bundle markdown features with sanitization levels. This prevents misconfiguration -- you can't accidentally enable raw HTML without appropriate sanitization. |
| 33 |
|
| 34 |
Custom configurations are still possible via the builder pattern (`Renderer::permissive().with_strip_images(true)`). |
| 35 |
|
| 36 |
### Two-phase rendering |
| 37 |
|
| 38 |
Rendering happens in two phases: |
| 39 |
1. **pulldown-cmark** parses markdown to HTML events, with optional filtering (strip images, strip raw HTML, neutralize dangerous URL schemes) |
| 40 |
2. **ammonia** sanitizes the resulting HTML string |
| 41 |
|
| 42 |
This means even the permissive preset strips `<script>` tags -- ammonia always runs. |
| 43 |
|
| 44 |
Post-processing steps (directives, mentions, quotes, media URLs) are applied after sanitization by consumers, not built into the render pipeline. |
| 45 |
|
| 46 |
### Feature-gated modules |
| 47 |
|
| 48 |
DocEngine has zero required dependencies beyond pulldown-cmark, ammonia, and serde. Consumers that only need rendering don't pull in regex, toml, or uuid. The `full` feature enables everything. |
| 49 |
|
| 50 |
The `regex` vs `regex-lite` split is intentional -- doc-loader's link rewriting needs the full regex engine while simpler patterns in directives, mentions, quotes, and media-urls use the lighter variant. |
| 51 |
|
| 52 |
### DocLoader loads once at startup |
| 53 |
|
| 54 |
`DocLoader::load()` reads all `.md` files from disk, renders them to HTML, and stores them in a `HashMap<String, DocPage>`. This happens once at application boot (MNW calls it during startup). Pages are served from memory with no disk I/O on request. |
| 55 |
|
| 56 |
Link rewriting converts relative `.md` references to the configured URL prefix (e.g., `./faq.md` becomes `/docs/faq`). Links to unpublished docs are stripped to plain text. |
| 57 |
|
| 58 |
### Assumption substitution runs before parsing |
| 59 |
|
| 60 |
The `assumptions` feature parses markdown via a regex pre-pass *before* pulldown-cmark sees it, so `{{ key }}` markers may appear anywhere -- prose, code spans, table cells, link text. The alternative (a markdown-aware pass) would either miss code spans (often the right place to write a number) or require re-implementing parts of the parser. |
| 61 |
|
| 62 |
The loader produces both a strongly-typed view (for validation and computing derived values) and a flat `HashMap<String, LookupValue>` keyed by dotted path. Unknown TOML sections are still walked into the lookup table so authors can add ad-hoc keys without changing Rust code; only the fields needed for validation/derived have to be modeled. |
| 63 |
|
| 64 |
### Filter pipeline is extensible |
| 65 |
|
| 66 |
Built-in filters cover numeric formatting (`int`, `ceil`, `floor`, `round`, `money`, `percent`) and string ops (`upper`, `lower`). Anything beyond that — locale-specific currency, custom rounding rules, project-specific notations — is added by the consumer through `Assumptions::with_filter(name, impl Filter)`. The `Filter` trait is single-method (`apply(input, &args) -> Result<LookupValue, FilterError>`) and has a blanket impl over `Fn(...)`, so a closure works as a filter without writing a struct. |
| 67 |
|
| 68 |
The mini-parser in `filters.rs` accepts `path (| name (args)?)*` where args are integer / float / quoted string literals. Filters chain left-to-right; the final `LookupValue` is `Display`-formatted into the output stream. This keeps the language tight while leaving the type system open for future control flow (`{% for %}` / `{% if %}`). |
| 69 |
|
| 70 |
### Mention resolution skips code |
| 71 |
|
| 72 |
`extract_mentions` and `resolve_mentions` detect inline code (backticks) and fenced code blocks, skipping any @mentions inside them. This prevents false positives from code examples. |
| 73 |
|
| 74 |
### Directive post-processing |
| 75 |
|
| 76 |
Directives (`[!NOTE]`, `[!TIP]`, `[!TABS]`, etc.) are implemented as HTML post-processing rather than markdown parsing extensions. This keeps the core render pipeline simple and makes directives composable with any preset. |
| 77 |
|
| 78 |
## Consumers |
| 79 |
|
| 80 |
|
| 81 |
|
| 82 |
| MNW | doc-loader, directives, frontmatter, media-urls, assumptions | Site docs loaded at boot, blog posts with frontmatter, user descriptions (standard), item markdown (standard), CDN image rewriting, build-time substitution of business values into the docs corpus | |
| 83 |
| Multithreaded | mentions, quotes | Forum posts (strict), @username linking, quote attribution | |
| 84 |
| GoingsOn | core | Task/event descriptions (standard) | |
| 85 |
| Balanced Breakfast | core | RSS feed content (sanitize_only) | |
| 86 |
| audiofiles | core | Sample descriptions (standard) | |
| 87 |
|
| 88 |
## Key Paths |
| 89 |
|
| 90 |
- `src/render.rs` -- the core rendering logic |
| 91 |
- `src/sanitize.rs` -- ammonia preset configurations |
| 92 |
- `src/directives.rs` -- alert and code tab processing |
| 93 |
- `src/doc_loader.rs` -- document loading and link rewriting |
| 94 |
- `src/media_urls.rs` -- CDN path rewriting |
| 95 |
|