max / pter
1 file changed,
+0 insertions,
-95 deletions
| @@ -1,95 +0,0 @@ | |||
| 1 | - | # pter - Todo | |
| 2 | - | ||
| 3 | - | Done: Phases 1-5 (except publish). Active: None. Next: cargo publish when ready. | |
| 4 | - | ||
| 5 | - | v0.1.0. 116 tests. | |
| 6 | - | ||
| 7 | - | --- | |
| 8 | - | ||
| 9 | - | ## Phase 1: Core Conversion | |
| 10 | - | ||
| 11 | - | ### Done | |
| 12 | - | - [x] Crate scaffold (Cargo.toml, MIT license, README) | |
| 13 | - | - [x] HTML element to markdown conversion (p, h1-h6, strong, em, a, img, ul/ol/li, blockquote, pre/code, hr, br, del, sup, sub) | |
| 14 | - | - [x] Tracking pixel detection (1x1 img, empty src, data URI, inline style) | |
| 15 | - | - [x] Hidden element skipping (display:none, visibility:hidden) | |
| 16 | - | - [x] Whitespace normalization (collapse blank lines, trim) | |
| 17 | - | - [x] Script/style/head stripping | |
| 18 | - | - [x] Entity decoding (via html5ever) | |
| 19 | - | - [x] Link deduplication (text matches URL) | |
| 20 | - | - [x] Nested list indentation | |
| 21 | - | - [x] Nested blockquote rendering | |
| 22 | - | - [x] Pre/code block rendering (no double-wrap) | |
| 23 | - | ||
| 24 | - | --- | |
| 25 | - | ||
| 26 | - | ## Phase 2: Email Layout Unwrapping | |
| 27 | - | ||
| 28 | - | ### Done | |
| 29 | - | - [x] Layout table detection heuristic (layout vs data table) | |
| 30 | - | - [x] Single-cell table unwrapping | |
| 31 | - | - [x] Multi-column table linearization | |
| 32 | - | - [x] Data table rendering as markdown table | |
| 33 | - | - [x] Nested layout table recursion | |
| 34 | - | - [x] font-size:0 / line-height:0 / height:0+overflow:hidden spacer detection | |
| 35 | - | - [x] role="presentation" detection | |
| 36 | - | ||
| 37 | - | ### Deferred | |
| 38 | - | - [ ] Outlook conditional comment stripping (client-specific, low cross-platform value) | |
| 39 | - | ||
| 40 | - | --- | |
| 41 | - | ||
| 42 | - | ## Phase 3: Reply Chain Detection | |
| 43 | - | ||
| 44 | - | ### Done | |
| 45 | - | - [x] Reply boundary abstraction (`is_reply_boundary` predicate) | |
| 46 | - | - [x] Structural markers (type=cite) | |
| 47 | - | - [x] CSS class markers (gmail_quote, divRplyFwdMsg, yahoo_quoted, protonmail_quote, tutanota_quote, moz-cite-prefix, zmail_extra) | |
| 48 | - | - [x] Attribution text detection (On ... wrote:, Forwarded message, Original Message, Begin forwarded message, French/German variants) | |
| 49 | - | - [x] Attribution line preservation above quote blocks | |
| 50 | - | - [x] Quote depth rendering via temporary buffer + `>` prefix | |
| 51 | - | - [x] Outlook separator detection (From/Sent/To/Subject blocks) | |
| 52 | - | - [x] Heuristic: div with attribution text followed by blockquote | |
| 53 | - | - [x] Previous sibling text scanning for attribution | |
| 54 | - | ||
| 55 | - | --- | |
| 56 | - | ||
| 57 | - | ## Phase 4: Integration | |
| 58 | - | ||
| 59 | - | ### Done | |
| 60 | - | - [x] GoingsOn: pter::convert() replaces strip_html in imap_client.rs extract_body_with_html() | |
| 61 | - | - [x] GoingsOn: removed ~230 lines of hand-rolled HTML stripping code + 30 tests (covered by pter) | |
| 62 | - | - [x] GoingsOn: path dep added to src-tauri/Cargo.toml | |
| 63 | - | - [x] Balanced Breakfast: pter::convert() replaces html2text in html_to_text + extract_article Rhai host functions | |
| 64 | - | - [x] Balanced Breakfast: html2text dependency removed from bb-core/Cargo.toml | |
| 65 | - | - [x] Both projects compile clean, BB tests pass (153 tests) | |
| 66 | - | ||
| 67 | - | --- | |
| 68 | - | ||
| 69 | - | ## Phase 5: Polish + Publish | |
| 70 | - | ||
| 71 | - | ### Done | |
| 72 | - | - [x] Property-based testing with proptest (7 fuzz strategies: never panics, no HTML leak, valid UTF-8, no triple newlines, no trailing whitespace, arbitrary bytes, whitespace-only) | |
| 73 | - | - [x] Edge case hardening (24 tests: empty, whitespace-only, deeply nested divs/blockquotes/lists, malformed HTML, unicode, large input, empty table cells, nested link formatting) | |
| 74 | - | - [x] Benchmarks with criterion (simple: 4µs, newsletter: 15µs, reply chain: 10µs, 100 sections: 101µs) | |
| 75 | - | ||
| 76 | - | ### Remaining | |
| 77 | - | - [ ] cargo publish to crates.io | |
| 78 | - | - [ ] Update GO and BB to crates.io version | |
| 79 | - | ||
| 80 | - | --- | |
| 81 | - | ||
| 82 | - | ## Key Paths | |
| 83 | - | ||
| 84 | - | | What | Where | | |
| 85 | - | |------|-------| | |
| 86 | - | | Public API | `src/lib.rs` | | |
| 87 | - | | Conversion pipeline | `src/convert.rs` | | |
| 88 | - | | Element classification | `src/elements.rs` | | |
| 89 | - | | Table handling | `src/tables.rs` | | |
| 90 | - | | Reply detection | `src/replies.rs` | | |
| 91 | - | | Whitespace normalization | `src/whitespace.rs` | | |
| 92 | - | | Integration tests | `tests/integration.rs` | | |
| 93 | - | | Edge case tests | `tests/edge_cases.rs` | | |
| 94 | - | | Property-based tests | `tests/proptest.rs` | | |
| 95 | - | | Benchmarks | `benches/convert_bench.rs` | |