Skip to main content

max / docengine

5.7 KB · 120 lines History Blame Raw
1 # DocEngine
2
3 Configurable markdown-to-HTML rendering library with sanitization presets. Built on pulldown-cmark (GFM) and ammonia.
4
5 Used by MNW (site docs, blog posts, user-generated content), Multithreaded (forum posts), and the desktop apps (descriptions, notes).
6
7 ## Presets
8
9 Four rendering presets, each with different security/feature tradeoffs:
10
11 | Preset | Use case | Tables | Images | Raw HTML | Dangerous scheme filter | Sanitization |
12 |--------|----------|:------:|:------:|:--------:|:-----------------------:|--------------|
13 | **Permissive** | Docs, blog posts (trusted) | Y | Y | Y | N | Default ammonia |
14 | **Standard** | App text fields (descriptions) | Y | N | Y | N | Default ammonia |
15 | **Strict** | User-generated content (forums) | N | N | N | Y | nofollow on links |
16 | **Sanitize-only** | External HTML (RSS feeds) | -- | -- | -- | -- | Default ammonia, no markdown parsing |
17
18 ```rust
19 use docengine::{render_permissive, render_standard, render_strict, sanitize_html};
20
21 // Convenience functions
22 let html = render_permissive("# Hello\n\n**Bold** text");
23 let html = render_standard("A description with [link](https://example.com)");
24 let html = render_strict("User post with @mentions and `code`");
25 let html = sanitize_html("<p>Pre-rendered</p><script>stripped</script>");
26
27 // Builder pattern for custom configurations
28 use docengine::{Renderer, SanitizePreset};
29
30 let html = Renderer::permissive()
31 .with_strip_images(true) // override: strip images even in permissive
32 .with_footnotes(false)
33 .render("# Custom config");
34
35 // Render with metadata (word count, reading time)
36 let result = Renderer::standard().render_with_meta("Some article text...");
37 println!("{} words, ~{} min read", result.word_count, result.reading_time_minutes);
38 ```
39
40 ## Feature Flags
41
42 All optional features are off by default. Enable what you need:
43
44 | Flag | Dependencies | Provides |
45 |------|-------------|----------|
46 | `doc-loader` | regex | `DocLoader` -- load a directory of `.md` files into an in-memory page store |
47 | `directives` | regex-lite | `post_process_directives` -- `[!NOTE]`/`[!TIP]`/`[!TABS]` blockquote alerts and code tabs |
48 | `frontmatter` | toml | `parse_frontmatter` -- extract TOML frontmatter delimited by `+++` |
49 | `mentions` | regex-lite | `extract_mentions`, `resolve_mentions` -- `@username` parsing and linking |
50 | `quotes` | regex-lite, uuid | `post_process_quotes` -- replace `[quote:POST_ID:HASH]` markers with author attribution |
51 | `media-urls` | regex-lite | `rewrite_media_paths`, `img_to_video` -- CDN path rewriting and video tag conversion |
52 | `full` | all of the above | Enable everything |
53
54 ```toml
55 # In Cargo.toml
56 docengine = { path = "../Shared/docengine" } # Core only
57 docengine = { path = "../Shared/docengine", features = ["full"] } # Everything
58 ```
59
60 ## Core API
61
62 ### Types
63
64 - **`Renderer`** -- configurable markdown renderer with builder pattern
65 - **`RenderResult`** -- rendered HTML plus `word_count` and `reading_time_minutes`
66 - **`SanitizePreset`** -- `Permissive`, `Standard`, `Strict`, `Minimal`
67 - **`TocEntry`** -- heading level, text, and anchor for table of contents
68
69 ### Functions
70
71 | Function | Description |
72 |----------|-------------|
73 | `render_permissive(md)` | Render with full GFM features |
74 | `render_standard(md)` | Render without images |
75 | `render_strict(md)` | Render with all restrictions (UGC-safe) |
76 | `sanitize_html(html)` | Clean pre-rendered HTML without markdown parsing |
77 | `word_count(text)` | Count words in raw text |
78 | `reading_time_minutes(wc)` | Estimate reading time (200 wpm) |
79 | `extract_title(md)` | Pull the first `# Heading` from markdown |
80 | `strip_first_heading(md)` | Remove the first `# Heading` (for template-rendered titles) |
81 | `extract_toc(md)` | Build a `Vec<TocEntry>` from all headings |
82 | `render_toc_html(entries)` | Render TOC entries as a `<nav class="toc">` HTML list |
83
84 ### Feature-gated
85
86 | Function / Type | Feature | Description |
87 |-----------------|---------|-------------|
88 | `DocLoader::load(path, config)` | `doc-loader` | Load `.md` files from disk, render to HTML, build searchable index |
89 | `DocPage`, `DocIndexEntry` | `doc-loader` | Page and index entry types |
90 | `post_process_directives(html)` | `directives` | Convert `[!NOTE]`/`[!TIP]`/etc. blockquotes to alert divs, `[!TABS]` to tabbed code blocks |
91 | `parse_frontmatter(input)` | `frontmatter` | Parse `+++`-delimited TOML frontmatter |
92 | `Frontmatter` | `frontmatter` | Struct with `title`, `date`, `tags`, `section`, `draft`, `extra` |
93 | `extract_mentions(md)` | `mentions` | Find unique `@username` mentions (skips code blocks) |
94 | `resolve_mentions(md, valid, template)` | `mentions` | Replace `@user` with `[@user](/path/to/user)` for known usernames |
95 | `post_process_quotes(html, authors)` | `quotes` | Replace `[quote:UUID:HASH]` with clickable attribution |
96 | `rewrite_media_paths(md, base, user)` | `media-urls` | Rewrite relative image paths to absolute CDN URLs |
97 | `img_to_video(html)` | `media-urls` | Convert `<img>` tags pointing to video files into `<video>` elements |
98
99 ## Consumers
100
101 | Project | Features used | Preset |
102 |---------|--------------|--------|
103 | MNW | `doc-loader`, `directives`, `frontmatter`, `media-urls` | Permissive (docs/blog), Standard (descriptions) |
104 | Multithreaded | `mentions`, `quotes` | Strict (forum posts) |
105 | GoingsOn | core only | Standard (notes, descriptions) |
106 | Balanced Breakfast | core only | Sanitize-only (RSS feed content) |
107
108 ## Security
109
110 All presets sanitize output through ammonia. The strict preset additionally:
111 - Strips all raw HTML and images at the parser level (before ammonia)
112 - Replaces `javascript:`, `data:`, `vbscript:` URLs with `#`
113 - Adds `rel="noopener noreferrer nofollow"` to all links
114
115 Zero unsafe code.
116
117 ## License
118
119 PolyForm Noncommercial 1.0.0
120