Skip to main content

max / docengine

4.4 KB · 81 lines History Blame Raw
1 # DocEngine Architecture
2
3 ## Overview
4
5 DocEngine is a markdown rendering library that wraps pulldown-cmark (parsing) and ammonia (sanitization) behind a preset system. Each preset configures which markdown features are enabled and how aggressively the output is sanitized.
6
7 ## Module Map
8
9 ```
10 src/
11 lib.rs Crate root, re-exports, convenience functions
12 render.rs Renderer struct (builder pattern, 4 presets, render/render_with_meta)
13 sanitize.rs SanitizePreset enum (Permissive, Standard, Strict, Minimal)
14 text.rs Text utilities (word_count, reading_time, extract_title, strip_first_heading)
15 toc.rs Table of contents extraction and HTML rendering
16 escape.rs HTML entity escaping for safe string interpolation
17 code_spans.rs Code span/block byte range detection (used by mentions to skip code)
18 directives.rs [directives] Alert/tabs blockquote post-processing
19 doc_loader.rs [doc-loader] Load .md files from disk into in-memory page store
20 frontmatter.rs [frontmatter] Parse +++delimited TOML frontmatter
21 media_urls.rs [media-urls] CDN path rewriting for images, img-to-video conversion
22 mentions.rs [mentions] @username extraction and resolution
23 quotes.rs [quotes] [quote:UUID:HASH] post-processing for forum attribution
24 ```
25
26 ## Design Decisions
27
28 ### Presets over configuration
29
30 Rather than exposing every pulldown-cmark option, DocEngine provides named presets that bundle markdown features with sanitization levels. This prevents misconfiguration -- you can't accidentally enable raw HTML without appropriate sanitization.
31
32 Custom configurations are still possible via the builder pattern (`Renderer::permissive().with_strip_images(true)`).
33
34 ### Two-phase rendering
35
36 Rendering happens in two phases:
37 1. **pulldown-cmark** parses markdown to HTML events, with optional filtering (strip images, strip raw HTML, neutralize dangerous URL schemes)
38 2. **ammonia** sanitizes the resulting HTML string
39
40 This means even the permissive preset strips `<script>` tags -- ammonia always runs.
41
42 Post-processing steps (directives, mentions, quotes, media URLs) are applied after sanitization by consumers, not built into the render pipeline.
43
44 ### Feature-gated modules
45
46 DocEngine has zero required dependencies beyond pulldown-cmark, ammonia, and serde. Consumers that only need rendering don't pull in regex, toml, or uuid. The `full` feature enables everything.
47
48 The `regex` vs `regex-lite` split is intentional -- doc-loader's link rewriting needs the full regex engine while simpler patterns in directives, mentions, quotes, and media-urls use the lighter variant.
49
50 ### DocLoader loads once at startup
51
52 `DocLoader::load()` reads all `.md` files from disk, renders them to HTML, and stores them in a `HashMap<String, DocPage>`. This happens once at application boot (MNW calls it during startup). Pages are served from memory with no disk I/O on request.
53
54 Link rewriting converts relative `.md` references to the configured URL prefix (e.g., `./faq.md` becomes `/docs/faq`). Links to unpublished docs are stripped to plain text.
55
56 ### Mention resolution skips code
57
58 `extract_mentions` and `resolve_mentions` detect inline code (backticks) and fenced code blocks, skipping any @mentions inside them. This prevents false positives from code examples.
59
60 ### Directive post-processing
61
62 Directives (`[!NOTE]`, `[!TIP]`, `[!TABS]`, etc.) are implemented as HTML post-processing rather than markdown parsing extensions. This keeps the core render pipeline simple and makes directives composable with any preset.
63
64 ## Consumers
65
66 | Consumer | Features | How it's used |
67 |----------|----------|---------------|
68 | MNW | doc-loader, directives, frontmatter, media-urls | Site docs loaded at boot, blog posts with frontmatter, user descriptions (standard), item markdown (standard), CDN image rewriting |
69 | Multithreaded | mentions, quotes | Forum posts (strict), @username linking, quote attribution |
70 | GoingsOn | core | Task/event descriptions (standard) |
71 | Balanced Breakfast | core | RSS feed content (sanitize_only) |
72 | audiofiles | core | Sample descriptions (standard) |
73
74 ## Key Paths
75
76 - `src/render.rs` -- the core rendering logic
77 - `src/sanitize.rs` -- ammonia preset configurations
78 - `src/directives.rs` -- alert and code tab processing
79 - `src/doc_loader.rs` -- document loading and link rewriting
80 - `src/media_urls.rs` -- CDN path rewriting
81