# pter Architecture ## Overview pter converts HTML email bodies into readable markdown. It takes an HTML string and returns a markdown string. It does not handle MIME parsing, content extraction, or markdown rendering. ## Pipeline ``` html: &str → scraper::Html::parse_document() # html5ever DOM tree → walk_children(root) # depth-first traversal → handle_text() # whitespace collapsing, entity decoding → handle_element() # classify → skip / transparent / block / inline → handle_block() # paragraphs, headings, lists, blockquotes, pre, hr → handle_inline() # bold, italic, links, images, code, br → whitespace::normalize() # collapse blank lines, trim → String ``` ## Module Responsibilities | Module | Responsibility | |--------|---------------| | `lib.rs` | Public API (`convert`), re-exports | | `convert.rs` | DOM walker, `Context` state, element dispatch | | `elements.rs` | Element classification, tracking pixel / hidden detection | | `whitespace.rs` | Output normalization | | `tables.rs` | Table layout detection and unwrapping (Phase 2) | | `replies.rs` | Reply chain detection and quoting (Phase 3) | ## Design Decisions **scraper over html5ever directly**: We need tree traversal (parent/child/sibling access) for layout table unwrapping and reply chain detection. scraper provides this via ego-tree on top of html5ever's spec-compliant parsing. **Markdown output**: Markdown is readable as plain text and renderable by any toolchain. It preserves structural information (headings, links, lists) that plain text loses. **Faithful conversion**: pter converts what's there. Content extraction (stripping marketing wrappers) and post-processing (trimming signatures) are separate concerns, composable before or after pter. **Blockquote rendering**: Blockquotes render children into a temporary buffer, then prefix each line with `> `. This handles nested blockquotes naturally — inner quotes produce `> ` lines, outer quote prefixes them again to get `> > `. ## Dependencies | Crate | Purpose | |-------|---------| | `scraper` | HTML parsing + DOM tree + CSS selectors | | `proptest` (dev) | Property-based testing |