Skip to main content

max / makenotwork

6.2 KB · 95 lines History Blame Raw
1 # DocEngine Architecture
2
3 ## Overview
4
5 DocEngine is a markdown rendering library that wraps pulldown-cmark (parsing) and ammonia (sanitization) behind a preset system. Each preset configures which markdown features are enabled and how aggressively the output is sanitized.
6
7 ## Module Map
8
9 ```
10 src/
11 lib.rs Crate root, re-exports, convenience functions
12 render.rs Renderer struct (builder pattern, 4 presets, render/render_with_meta)
13 sanitize.rs SanitizePreset enum (Permissive, Standard, Strict, Minimal)
14 text.rs Text utilities (word_count, reading_time, extract_title, strip_first_heading)
15 toc.rs Table of contents extraction and HTML rendering
16 escape.rs HTML entity escaping for safe string interpolation
17 code_spans.rs Code span/block byte range detection (used by mentions to skip code)
18 directives.rs [directives] Alert/tabs blockquote post-processing
19 doc_loader.rs [doc-loader] Load .md files from disk into in-memory page store
20 frontmatter.rs [frontmatter] Parse +++delimited TOML frontmatter
21 media_urls.rs [media-urls] CDN path rewriting for images, img-to-video conversion
22 mentions.rs [mentions] @username extraction and resolution
23 quotes.rs [quotes] [quote:UUID:HASH] post-processing for forum attribution
24 assumptions.rs [assumptions] TOML source-of-truth loader, derived registry, {{ … }} substitution
25 filters.rs [assumptions] Filter trait, built-in filters, and the mini-parser for {{ path | filter(args) }} expressions
26 ```
27
28 ## Design Decisions
29
30 ### Presets over configuration
31
32 Rather than exposing every pulldown-cmark option, DocEngine provides named presets that bundle markdown features with sanitization levels. This prevents misconfiguration -- you can't accidentally enable raw HTML without appropriate sanitization.
33
34 Custom configurations are still possible via the builder pattern (`Renderer::permissive().with_strip_images(true)`).
35
36 ### Two-phase rendering
37
38 Rendering happens in two phases:
39 1. **pulldown-cmark** parses markdown to HTML events, with optional filtering (strip images, strip raw HTML, neutralize dangerous URL schemes)
40 2. **ammonia** sanitizes the resulting HTML string
41
42 This means even the permissive preset strips `<script>` tags -- ammonia always runs.
43
44 Post-processing steps (directives, mentions, quotes, media URLs) are applied after sanitization by consumers, not built into the render pipeline.
45
46 ### Feature-gated modules
47
48 DocEngine has zero required dependencies beyond pulldown-cmark, ammonia, and serde. Consumers that only need rendering don't pull in regex, toml, or uuid. The `full` feature enables everything.
49
50 The `regex` vs `regex-lite` split is intentional -- doc-loader's link rewriting needs the full regex engine while simpler patterns in directives, mentions, quotes, and media-urls use the lighter variant.
51
52 ### DocLoader loads once at startup
53
54 `DocLoader::load()` reads all `.md` files from disk, renders them to HTML, and stores them in a `HashMap<String, DocPage>`. This happens once at application boot (MNW calls it during startup). Pages are served from memory with no disk I/O on request.
55
56 Link rewriting converts relative `.md` references to the configured URL prefix (e.g., `./faq.md` becomes `/docs/faq`). Links to unpublished docs are stripped to plain text.
57
58 ### Assumption substitution runs before parsing
59
60 The `assumptions` feature parses markdown via a regex pre-pass *before* pulldown-cmark sees it, so `{{ key }}` markers may appear anywhere -- prose, code spans, table cells, link text. The alternative (a markdown-aware pass) would either miss code spans (often the right place to write a number) or require re-implementing parts of the parser.
61
62 The loader produces both a strongly-typed view (for validation and computing derived values) and a flat `HashMap<String, LookupValue>` keyed by dotted path. Unknown TOML sections are still walked into the lookup table so authors can add ad-hoc keys without changing Rust code; only the fields needed for validation/derived have to be modeled.
63
64 ### Filter pipeline is extensible
65
66 Built-in filters cover numeric formatting (`int`, `ceil`, `floor`, `round`, `money`, `percent`) and string ops (`upper`, `lower`). Anything beyond that — locale-specific currency, custom rounding rules, project-specific notations — is added by the consumer through `Assumptions::with_filter(name, impl Filter)`. The `Filter` trait is single-method (`apply(input, &args) -> Result<LookupValue, FilterError>`) and has a blanket impl over `Fn(...)`, so a closure works as a filter without writing a struct.
67
68 The mini-parser in `filters.rs` accepts `path (| name (args)?)*` where args are integer / float / quoted string literals. Filters chain left-to-right; the final `LookupValue` is `Display`-formatted into the output stream. This keeps the language tight while leaving the type system open for future control flow (`{% for %}` / `{% if %}`).
69
70 ### Mention resolution skips code
71
72 `extract_mentions` and `resolve_mentions` detect inline code (backticks) and fenced code blocks, skipping any @mentions inside them. This prevents false positives from code examples.
73
74 ### Directive post-processing
75
76 Directives (`[!NOTE]`, `[!TIP]`, `[!TABS]`, etc.) are implemented as HTML post-processing rather than markdown parsing extensions. This keeps the core render pipeline simple and makes directives composable with any preset.
77
78 ## Consumers
79
80 | Consumer | Features | How it's used |
81 |----------|----------|---------------|
82 | MNW | doc-loader, directives, frontmatter, media-urls, assumptions | Site docs loaded at boot, blog posts with frontmatter, user descriptions (standard), item markdown (standard), CDN image rewriting, build-time substitution of business values into the docs corpus |
83 | Multithreaded | mentions, quotes | Forum posts (strict), @username linking, quote attribution |
84 | GoingsOn | core | Task/event descriptions (standard) |
85 | Balanced Breakfast | core | RSS feed content (sanitize_only) |
86 | audiofiles | core | Sample descriptions (standard) |
87
88 ## Key Paths
89
90 - `src/render.rs` -- the core rendering logic
91 - `src/sanitize.rs` -- ammonia preset configurations
92 - `src/directives.rs` -- alert and code tab processing
93 - `src/doc_loader.rs` -- document loading and link rewriting
94 - `src/media_urls.rs` -- CDN path rewriting
95