Skip to main content

max / makenotwork

6.5 KB · 125 lines History Blame Raw
1 # DocEngine
2
3 Configurable markdown-to-HTML rendering library with sanitization presets. Built on pulldown-cmark (GFM) and ammonia.
4
5 Used by MNW (site docs, blog posts, user-generated content), Multithreaded (forum posts), and the desktop apps (descriptions, notes).
6
7 ## Presets
8
9 Four rendering presets, each with different security/feature tradeoffs:
10
11 | Preset | Use case | Tables | Images | Raw HTML | Dangerous scheme filter | Sanitization |
12 |--------|----------|:------:|:------:|:--------:|:-----------------------:|--------------|
13 | **Permissive** | Docs, blog posts (trusted) | Y | Y | Y | N | Default ammonia |
14 | **Standard** | App text fields (descriptions) | Y | N | Y | N | Default ammonia |
15 | **Strict** | User-generated content (forums) | N | N | N | Y | nofollow on links |
16 | **Sanitize-only** | External HTML (RSS feeds) | -- | -- | -- | -- | Default ammonia, no markdown parsing |
17
18 ```rust
19 use docengine::{render_permissive, render_standard, render_strict, sanitize_html};
20
21 // Convenience functions
22 let html = render_permissive("# Hello\n\n**Bold** text");
23 let html = render_standard("A description with [link](https://example.com)");
24 let html = render_strict("User post with @mentions and `code`");
25 let html = sanitize_html("<p>Pre-rendered</p><script>stripped</script>");
26
27 // Builder pattern for custom configurations
28 use docengine::{Renderer, SanitizePreset};
29
30 let html = Renderer::permissive()
31 .with_strip_images(true) // override: strip images even in permissive
32 .with_footnotes(false)
33 .render("# Custom config");
34
35 // Render with metadata (word count, reading time)
36 let result = Renderer::standard().render_with_meta("Some article text...");
37 println!("{} words, ~{} min read", result.word_count, result.reading_time_minutes);
38 ```
39
40 ## Feature Flags
41
42 All optional features are off by default. Enable what you need:
43
44 | Flag | Dependencies | Provides |
45 |------|-------------|----------|
46 | `doc-loader` | regex | `DocLoader` -- load a directory of `.md` files into an in-memory page store |
47 | `directives` | regex-lite | `post_process_directives` -- `[!NOTE]`/`[!TIP]`/`[!TABS]` blockquote alerts and code tabs |
48 | `frontmatter` | toml | `parse_frontmatter` -- extract TOML frontmatter delimited by `+++` |
49 | `mentions` | regex-lite | `extract_mentions`, `resolve_mentions` -- `@username` parsing and linking |
50 | `quotes` | regex-lite, uuid | `post_process_quotes` -- replace `[quote:POST_ID:HASH]` markers with author attribution |
51 | `media-urls` | regex-lite | `rewrite_media_paths`, `img_to_video` -- CDN path rewriting and video tag conversion |
52 | `assumptions` | toml, regex-lite | `Assumptions` -- load a TOML source-of-truth file, compute derived values, validate, substitute `{{ dotted.path \| filter(args) }}` markers in markdown with an extensible filter pipeline (built-ins: `int`, `ceil`, `floor`, `round`, `money`, `percent`, `upper`, `lower`) |
53 | `full` | all of the above | Enable everything |
54
55 ```toml
56 # In Cargo.toml
57 docengine = { path = "../shared/docengine" } # Core only (from MNW/server/)
58 docengine = { path = "../../MNW/shared/docengine" } # From Apps/
59 ```
60
61 ## Core API
62
63 ### Types
64
65 - **`Renderer`** -- configurable markdown renderer with builder pattern
66 - **`RenderResult`** -- rendered HTML plus `word_count` and `reading_time_minutes`
67 - **`SanitizePreset`** -- `Permissive`, `Standard`, `Strict`, `Minimal`
68 - **`TocEntry`** -- heading level, text, and anchor for table of contents
69
70 ### Functions
71
72 | Function | Description |
73 |----------|-------------|
74 | `render_permissive(md)` | Render with full GFM features |
75 | `render_standard(md)` | Render without images |
76 | `render_strict(md)` | Render with all restrictions (UGC-safe) |
77 | `sanitize_html(html)` | Clean pre-rendered HTML without markdown parsing |
78 | `word_count(text)` | Count words in raw text |
79 | `reading_time_minutes(wc)` | Estimate reading time (200 wpm) |
80 | `extract_title(md)` | Pull the first `# Heading` from markdown |
81 | `strip_first_heading(md)` | Remove the first `# Heading` (for template-rendered titles) |
82 | `extract_toc(md)` | Build a `Vec<TocEntry>` from all headings |
83 | `render_toc_html(entries)` | Render TOC entries as a `<nav class="toc">` HTML list |
84
85 ### Feature-gated
86
87 | Function / Type | Feature | Description |
88 |-----------------|---------|-------------|
89 | `DocLoader::load(path, config)` | `doc-loader` | Load `.md` files from disk, render to HTML, build searchable index |
90 | `DocPage`, `DocIndexEntry` | `doc-loader` | Page and index entry types |
91 | `post_process_directives(html)` | `directives` | Convert `[!NOTE]`/`[!TIP]`/etc. blockquotes to alert divs, `[!TABS]` to tabbed code blocks |
92 | `parse_frontmatter(input)` | `frontmatter` | Parse `+++`-delimited TOML frontmatter |
93 | `Frontmatter` | `frontmatter` | Struct with `title`, `date`, `tags`, `section`, `draft`, `extra` |
94 | `extract_mentions(md)` | `mentions` | Find unique `@username` mentions (skips code blocks) |
95 | `resolve_mentions(md, valid, template)` | `mentions` | Replace `@user` with `[@user](/path/to/user)` for known usernames |
96 | `post_process_quotes(html, authors)` | `quotes` | Replace `[quote:UUID:HASH]` with clickable attribution |
97 | `rewrite_media_paths(md, base, user)` | `media-urls` | Rewrite relative image paths to absolute CDN URLs |
98 | `img_to_video(html)` | `media-urls` | Convert `<img>` tags pointing to video files into `<video>` elements |
99 | `Assumptions::load(path)` / `::parse(text)` | `assumptions` | Load and parse a TOML assumptions file |
100 | `Assumptions::validate()` | `assumptions` | Check internal consistency (sums, bounds, founding ≤ standard) |
101 | `Assumptions::substitute(md)` | `assumptions` | Replace `{{ dotted.path \| filter(args) }}` placeholders with raw or derived values, optionally piped through filters |
102 | `Assumptions::with_filter(name, impl Filter)` | `assumptions` | Register a custom filter for use in the substitution pipeline |
103
104 ## Consumers
105
106 | Project | Features used | Preset |
107 |---------|--------------|--------|
108 | MNW | `doc-loader`, `directives`, `frontmatter`, `media-urls`, `assumptions` | Permissive (docs/blog), Standard (descriptions) |
109 | Multithreaded | `mentions`, `quotes` | Strict (forum posts) |
110 | GoingsOn | core only | Standard (notes, descriptions) |
111 | Balanced Breakfast | core only | Sanitize-only (RSS feed content) |
112
113 ## Security
114
115 All presets sanitize output through ammonia. The strict preset additionally:
116 - Strips all raw HTML and images at the parser level (before ammonia)
117 - Replaces `javascript:`, `data:`, `vbscript:` URLs with `#`
118 - Adds `rel="noopener noreferrer nofollow"` to all links
119
120 Zero unsafe code.
121
122 ## License
123
124 PolyForm Noncommercial 1.0.0
125