Skip to main content

max / balanced_breakfast

15.5 KB · 217 lines History Blame Raw
1 # Balanced Breakfast -- Architecture
2
3 When it comes to media and news, it's good to be a picky eater.
4
5 ## Positioning
6
7 The only native desktop feed reader with a user-scriptable plugin system for arbitrary sources. Rhai plugins let users write custom source adapters (no other reader is extensible this way). First-class Hacker News and arXiv support. Free with no limits, no account required, local-first. Cross-platform native (macOS, Windows, Linux via Tauri 2). Target users: power users/developers consuming content from many sources, privacy-conscious local-first users, technical professionals who value keyboard shortcuts and hackability.
8
9 ## System Overview
10
11 Balanced Breakfast is a desktop feed aggregator built with Tauri 2. It unifies RSS, Atom, JSON Feed, Hacker News, arXiv, and other sources into a single timeline. The backend is a Rust workspace with four library crates and one application crate. The frontend is vanilla HTML/CSS/JS served by Tauri's webview. Feeds are fetched by Rhai script plugins ("bussers"), stored in SQLite, and presented through a three-panel layout (sources, items, detail).
12
13 ## Workspace Layout
14
15 | Crate | Path | Purpose |
16 |-------|------|---------|
17 | bb-interface | `crates/bb-interface/` | Leaf crate. Shared types for the plugin contract: `FeedItem`, `FetchResult`, `ConfigSchema`, `ConfigField`, `BusserCapabilities`, `BusserConfig`. No internal dependencies. |
18 | bb-core | `crates/bb-core/` | Orchestrator and plugin runtime. Coordinates plugins, database, and feed scheduling. Contains the Rhai engine setup, plugin manager, config encryption (`crypto`), and URL tracker stripping (`url_cleaner`). Depends on bb-interface. |
19 | bb-feed | `crates/bb-feed/` | Feed aggregation and ordering. `FeedGenerator` reads items from the DB and applies filters (source, unread, starred, search, tags, query feed conditions). `OrderBy` sorts results (chronological, score, unread-first, starred-first). Depends on bb-interface. |
20 | bb-db | `crates/bb-db/` | SQLite persistence via sqlx. Repository types for feeds, items, tags, busser state, user config, and query feeds. FTS5 full-text search. Depends on bb-interface. |
21 | src-tauri | `src-tauri/` | Tauri 2 desktop shell. Thin command wrappers over the library crates, app state management, background tasks (auto-fetch, stale cleanup), sync scheduler, and the vanilla JS frontend. Depends on all four library crates. |
22
23 Dependency flow: `bb-interface` (leaf) --> `bb-core`, `bb-feed`, `bb-db` --> `src-tauri` (root).
24
25 ## Orchestrator
26
27 The `Orchestrator` (`bb-core::orchestrator`) is the central coordination point. It owns the `Database` and a `PluginManager` behind an `Arc<RwLock<>>`. Its responsibilities:
28
29 - **Plugin lifecycle** -- load `.rhai` scripts from the plugins directory, initialize them with config from the DB, and provide fetch/shutdown operations.
30 - **Fetch execution** -- call a plugin's `fetch()`, strip tracking parameters from item URLs and HTML bodies, upsert results into the DB via the items repository, and record success/failure on the feed.
31 - **Circuit breaker** -- after 10 consecutive fetch failures (`CIRCUIT_BREAKER_THRESHOLD`), the feed is marked `circuit_broken` and excluded from auto-fetch until manually reset.
32 - **Secret management** -- holds an optional AES-256-GCM key. On startup, encrypts any plaintext Secret fields in existing feed configs (migration from legacy plaintext).
33 - **Fetch-all** -- iterates all loaded plugins and fetches each, collecting total item counts.
34
35 The orchestrator does not own the fetch scheduler or background tasks. Those are managed by `AppState` in the Tauri layer.
36
37 ## Plugin System (Rhai)
38
39 Plugins are `.rhai` text files dropped into the plugins directory. The Rhai engine is configured with safety limits and host functions.
40
41 ### Plugin Contract
42
43 Every plugin must define four functions:
44
45 - `id()` -- returns a unique string identifier (e.g. `"rss"`, `"hackernews"`)
46 - `name()` -- returns a human-readable display name
47 - `config_schema()` -- returns a map describing configuration fields (key, label, field_type, required, default, options)
48 - `fetch(config, cursor)` -- returns `{ items: [...], has_more: bool, next_cursor: string? }`
49
50 An optional `capabilities()` function can declare pagination support, custom fetch intervals, auth requirements, etc.
51
52 ### Sandboxing
53
54 - **Operations cap:** `max_operations(100_000)` -- a typical RSS fetch costs 1k-5k ops; this catches infinite loops.
55 - **Expression depth:** `max_expr_depths(128, 128)` -- prevents stack overflows from deeply nested or recursive scripts.
56 - **HTTP timeout:** 15 seconds per request.
57 - **Response size:** 2 MB cap per response body.
58 - **Request limit:** 100 HTTP requests per `fetch()` invocation. Counter resets at the start of each fetch.
59 - **URL validation:** only `http://` and `https://` schemes; localhost, `127.0.0.1`, `[::1]`, `0.0.0.0`, and private RFC 1918 ranges (`10.x`, `172.16-31.x`, `192.168.x`, `169.254.x`) are blocked.
60
61 ### Host Functions
62
63 Functions registered into the Rhai engine for scripts to call:
64
65 | Function | Description |
66 |----------|-------------|
67 | `http_get(url)` | Fetch URL, return response body as string |
68 | `http_get_json(url)` | Fetch URL, parse JSON, return as Dynamic |
69 | `parse_json(str)` | Parse a JSON string |
70 | `parse_xml(str)` | Parse XML into a simplified `{tag, text, attrs, children}` structure |
71 | `parse_feed(str)` | Auto-detect RSS/Atom/JSON Feed, return `{title, link, entries}` |
72 | `parse_datetime(str)` | Parse ISO 8601 or RFC 2822 date to Unix timestamp |
73 | `timestamp_now()` | Current UTC timestamp (seconds) |
74 | `html_to_text(html)` | Strip HTML, render as plain text (80-char width) |
75 | `extract_article(html)` | Readability extraction: returns `{title, content, text}` |
76 | `truncate(text, max)` | Truncate with ellipsis |
77 | `str_contains`, `str_split`, `str_replace`, `str_trim` | String utilities |
78 | `strip_tracking(url)` | Remove utm_*, fbclid, gclid, etc. from a URL |
79 | `parse_int(str)` | Parse string to integer (returns UNIT on failure) |
80 | `debug_print(val)` | Log to tracing at debug level |
81
82 ### Config Field Types
83
84 Plugins declare their configuration schema with these field types: `Text`, `TextArea`, `Secret`, `Url`, `Number`, `Toggle`, `Select`. Fields marked `Url` become feed subscriptions; other fields become key-value options passed to `fetch()`.
85
86 ### Bundled Plugins
87
88 Three plugins ship with the app: `rss.rhai` (RSS/Atom/JSON Feed), `hackernews.rhai` (HN stories), `arxiv.rhai` (arXiv papers). A `reader.rhai` plugin extracts article content from URLs using the readability algorithm.
89
90 ## Feed Aggregation
91
92 The `FeedGenerator` (`bb-feed::generator`) reads items from the database, applies filters and ordering, and returns paginated results.
93
94 **Filtering** combines SQL-level and in-memory strategies:
95 - Source, unread, starred, and FTS5 search are pushed into SQL for accurate LIMIT/OFFSET pagination.
96 - Item-level tags, feed-level tags, and query feed conditions (title/author/body contains, equals, not_contains, matches_regex) run in-memory after the SQL query.
97
98 **Ordering** is applied in-memory after filtering:
99 - `Chronological` -- newest first (default)
100 - `Score` -- highest score first, with chronological tiebreak
101 - `UnreadFirst` -- unread items before read, chronological within each group
102 - `StarredFirst` -- starred items before unstarred, chronological within each group
103
104 **Pagination** fetches `page_size + 1` items to detect whether more pages exist, then truncates to the exact page size.
105
106 ## Database Layer
107
108 SQLite via sqlx with compile-time migrations (10 migrations). The `Database` struct holds a connection pool (`max_connections: 16`) and provides typed repository accessors.
109
110 ### Tables
111
112 | Table | Purpose |
113 |-------|---------|
114 | `feeds` | Registered feed subscriptions. Keyed by UUID, linked to a busser_id. Tracks config JSON, enabled state, last_fetch, health counters, and circuit breaker state. |
115 | `feed_items` | All fetched items. Deduplicated by `external_id` (UNIQUE). Stores bite display fields, full content, metadata (score, tags as JSON array), and user state (is_read, is_starred). |
116 | `feed_items_fts` | FTS5 virtual table in external-content mode. Indexes title, body, and bite_text. Kept in sync via INSERT/UPDATE/DELETE triggers. |
117 | `feed_tags` | User-assigned tags on feeds (many-to-many). |
118 | `busser_state` | Plugin key-value state (cursors, tokens, pagination markers). Keyed by `(busser_id, key)`. |
119 | `user_config` | Key-value preferences (theme, welcome flag). Synced via changelog triggers. |
120 | `query_feeds` | Saved filter rules that act as virtual sources. Rules stored as JSON array. Synced via changelog triggers. |
121 | `sync_state` | Single-row sync metadata (device_id, pull_cursor, auto_sync settings). |
122 | `sync_changelog` | Local changes pending push. Written by triggers on feeds, feed_tags, user_config, query_feeds, and feed_items (user state only). |
123
124 ### Repositories
125
126 - `FeedsRepository` -- CRUD, enable/disable, last_fetch updates, fetch failure recording, circuit breaker management
127 - `ItemsRepository` -- upsert (dedup by external_id), read/star toggling, paginated listing (by busser, by feed, unread, starred), FTS5 search, counts, stale item deletion
128 - `TagsRepository` -- per-feed tag assignment, distinct tag listing, bulk feed-tag pairs
129 - `StateRepository` -- busser key-value state (get/set/delete by busser_id + key)
130 - `ConfigRepository` -- user_config key-value pairs (get/set/delete)
131 - `QueryFeedsRepository` -- query feed CRUD (create/update/delete/list)
132
133 FTS5 queries are sanitized by wrapping each search term in double quotes to prevent syntax injection (`AND`, `OR`, `NOT`, `NEAR` operators). The `^` prefix and `*` suffix characters are stripped.
134
135 ## Sync Integration
136
137 Balanced Breakfast integrates with the SyncKit client SDK for cross-device sync. The `sync_service` module handles push/pull of local changes.
138
139 **What gets synced:**
140 - Feed subscriptions (feeds table: config, enabled state, health counters)
141 - Feed tags
142 - User config (preferences)
143 - Query feeds (saved filter rules)
144 - Feed item user state (is_read, is_starred changes only -- not item content)
145
146 **How it works:** SQLite triggers on synced tables write changes to `sync_changelog`. The sync engine pushes unpushed entries in batches of 500, pulls remote changes using a cursor, and applies them in FK-safe order (parents before children for upserts, children before parents for deletes). A `applying_remote` flag in `sync_state` suppresses trigger firing during remote change application to prevent echo loops.
147
148 The sync scheduler runs on a configurable interval (default 15 minutes). Encryption is E2E via the SyncKit client's ChaCha20-Poly1305 with keys stored in the OS keychain.
149
150 ## Security Model
151
152 - **Plugin secrets at rest:** AES-256-GCM encryption. Encrypted format: `bb_enc:v1:<base64(nonce[12] || ciphertext || tag[16])>`. Key stored in `encryption.key` with 0600 permissions (Unix). Backward-compatible: unencrypted values pass through on decrypt.
153 - **FTS5 query sanitization:** User search input is quoted per-word to prevent FTS5 operator injection. Special characters (`^`, `*`) are stripped.
154 - **URL validation:** Rhai HTTP host functions block non-HTTP schemes and requests to localhost/internal addresses.
155 - **Response size limits:** 2 MB cap on HTTP response bodies prevents memory exhaustion.
156 - **URL tracking removal:** utm_*, fbclid, gclid, msclkid, and other tracking parameters stripped from item URLs and body HTML on ingest.
157 - **Sync encryption:** E2E via SyncKit (ChaCha20-Poly1305 + Argon2 key derivation). Server never sees plaintext.
158
159 ## Concurrency Model
160
161 - **Tokio async runtime** (multi-threaded) drives all I/O: database queries, HTTP fetches, sync operations.
162 - **`Arc<RwLock<PluginManager>>`** -- the orchestrator holds the plugin manager behind a Tokio RwLock. Read lock for fetches and schema queries; write lock only during plugin loading.
163 - **`Arc<AppState>`** -- shared across Tauri commands and background tasks. Managed by Tauri's state system.
164 - **AbortHandles** -- background tasks (auto-fetch loop, stale cleanup) store their `AbortHandle` in `AppState` behind `std::sync::Mutex`. On shutdown or task replacement, existing handles are aborted.
165 - **Auto-fetch loop** -- checks every 60 seconds which plugins are due for a fetch based on their last_fetch timestamp and configured interval.
166 - **Stale cleanup** -- runs every 6 hours, deleting read (non-starred) items older than 30 days.
167
168 ## Frontend Architecture
169
170 The frontend is vanilla HTML/CSS/JS served by Tauri's webview. There is no build step or bundler.
171
172 - **Tauri commands** act as thin wrappers: each command extracts parameters, calls the orchestrator or feed generator, and returns a serialized response. All business logic lives in the library crates.
173 - **Tauri events** notify the frontend of background activity: `auto-fetch-complete` (new items available), `auto-fetch-error`, `feed-circuit-broken`.
174 - **JS files** live in `src-tauri/frontend/js/`. Communication with Rust is via `window.__TAURI__`.
175
176 ## Feed Health Tracking
177
178 Each feed tracks `consecutive_failures` and `last_error`. On fetch success, failures reset to 0. On failure, the counter increments. Health status:
179
180 - **Green (healthy):** 0 consecutive failures
181 - **Yellow (degraded):** 1-9 consecutive failures
182 - **Red (circuit broken):** 10+ failures trips the circuit breaker; feed is excluded from auto-fetch until manually reset via `reset_circuit_breaker`
183
184 ## Key Design Decisions
185
186 - **Rhai over WASM/Lua:** Rhai is a Rust-native scripting language with easy type bridging and built-in safety limits. No FFI boundary, no separate runtime. Plugins are plain text files, not compiled artifacts.
187 - **Single orchestrator:** All coordination flows through one struct. No message passing between crates; the orchestrator calls methods directly. Simpler than an actor model for this scale.
188 - **SQL-first filtering with in-memory fallback:** Simple filters (source, unread, starred, search) use SQL for correct pagination. Complex filters (regex, tag intersection, query feed conditions) run in-memory. This avoids dynamic SQL generation while keeping common paths fast.
189 - **External-content FTS5:** The FTS index references `feed_items` by rowid with no data duplication. Triggers keep it in sync. This saves disk space compared to a full-copy FTS table.
190 - **Dedup by external_id:** Items use `external_id` (UNIQUE) for deduplication on upsert. The busser provides the ID; the DB enforces uniqueness.
191 - **Changelog-based sync:** SQLite triggers write changes to `sync_changelog` rather than diffing snapshots. This captures intent (INSERT/UPDATE/DELETE) and works naturally with the SyncKit push/pull model.
192
193 ## Key Paths
194
195 | What | Where |
196 |------|-------|
197 | Workspace manifest | `Cargo.toml` |
198 | Plugin interface types | `crates/bb-interface/src/` |
199 | Orchestrator | `crates/bb-core/src/orchestrator.rs` |
200 | Plugin manager | `crates/bb-core/src/plugin_manager.rs` |
201 | Rhai runtime | `crates/bb-core/src/rhai_plugin/` |
202 | Host functions | `crates/bb-core/src/rhai_plugin/host_functions.rs` |
203 | Type conversions | `crates/bb-core/src/rhai_plugin/conversions.rs` |
204 | Secret encryption | `crates/bb-core/src/crypto.rs` |
205 | URL cleaner | `crates/bb-core/src/url_cleaner.rs` |
206 | Feed generator | `crates/bb-feed/src/generator.rs` |
207 | Ordering/filtering | `crates/bb-feed/src/ordering.rs` |
208 | Database layer | `crates/bb-db/src/` |
209 | Repositories | `crates/bb-db/src/repository.rs` |
210 | Migrations | `migrations/sqlite/` (001-010) |
211 | Tauri app state | `src-tauri/src/state.rs` |
212 | Tauri commands | `src-tauri/src/commands/` |
213 | Sync service | `src-tauri/src/sync_service.rs` |
214 | Bundled plugins | `plugins/` |
215 | Frontend JS | `src-tauri/frontend/js/` |
216 | Frontend CSS | `src-tauri/frontend/css/` |
217