max / makenotwork

11.1 KB · 189 lines History Blame Raw

1	# Multithreaded Architecture
2
3	## 1. System Overview
4
5	Multithreaded (MT) is a forum platform for MNW creators. Each MNW project gets a community forum where creators and their audiences can discuss items, post devlogs, and organize conversations by category. MT is a standalone web service that delegates authentication to MNW and receives commands from MNW via an internal API.
6
7	MT serves HTML pages directly (server-side rendered via Askama templates and enhanced with HTMX). There is no SPA, no JavaScript framework, and no client-side routing. Users interact with MT through their browser; the server owns all rendering and state.
8
9	### Position in the MNW ecosystem
10
11	```
12	MNW Server (makenot.work)
13	\|
14	\|-- OAuth provider (user accounts, tokens)
15	\|-- Internal API caller (community creation, cross-posted threads)
16	\|
17	v
18	Multithreaded (forums.makenot.work)
19	\|
20	\|-- PostgreSQL (forum data, sessions, search indexes)
21	\|-- S3 (image uploads, optional)
22	```
23
24	MNW is the source of truth for user identity. MT mirrors user data locally via ON CONFLICT upserts on every login and internal API call. Communities in MT map 1:1 to projects in MNW, created either when a user first visits or when MNW calls the internal API to provision one.
25
26	## 2. Crate Structure
27
28	MT is a Cargo workspace with three crates. The boundary rule is strict: library crates contain no web framework types.
29
30	```
31	multithreaded/ (workspace root)
32	Cargo.toml # Workspace definition + root crate deps
33	src/ # Root crate (binary)
34	crates/
35	mt-core/ # Domain types, zero internal deps
36	mt-db/ # Database queries/mutations, depends on mt-core
37	```
38
39	### Root crate (multithreaded)
40
41	The binary. Owns the Axum server, route handlers, templates, middleware (CSRF, sessions, rate limiting), OAuth client, S3 integration, link preview fetching, and the internal HMAC-authenticated API. Depends on both mt-core and mt-db, plus shared crates from `MNW/shared/` (docengine for Markdown rendering, tagtree for tag validation, s3-storage for object storage).
42
43	### mt-core
44
45	Leaf crate with no internal dependencies. Defines domain enums used across the codebase:
46
47	- `CommunityRole` (Owner, Moderator, Member) with permission helpers
48	- `BanType` (Ban, Mute)
49	- `ModAction` (19 variants covering all auditable actions)
50	- `SortColumn` / `SortOrder` for thread listing queries
51	- `time_format` module for relative timestamps ("3 hours ago")
52
53	### mt-db
54
55	Database access layer. Depends only on mt-core, sqlx, chrono, and uuid. Split into two modules:
56
57	- `queries.rs` -- read-only functions returning `sqlx::FromRow` projection structs shaped for templates
58	- `mutations.rs` -- write functions (insert, update, upsert, soft delete)
59
60	All SQL uses positional parameters (`$1`, `$2`). No ORM, no query builder. Projection structs are purpose-built for each query, not generic domain models.
61
62	## 3. Data Flows
63
64	### Post creation
65
66	1. User submits a form (POST to `/p/{slug}/{category}/new` or `/{thread_id}/reply`).
67	2. Rate limiter checks per-IP write budget (burst 10, then 2/sec).
68	3. CSRF middleware validates the synchronizer token.
69	4. Handler extracts `SessionUser` from the session, verifies community membership, checks ban/mute status and thread lock state.
70	5. Markdown body is rendered to HTML via `docengine::render_strict()` with @mention resolution.
71	6. `mt_db::mutations::create_post()` inserts the post and updates the thread's `last_activity_at`.
72	7. Link preview extraction runs in the background: URLs are parsed from the Markdown via pulldown-cmark, fetched with an SSRF-safe HTTP client, and OG metadata is stored.
73	8. Redirect back to the thread with a toast message.
74
75	### Moderation flow
76
77	Content moderation operates at three levels:
78
79	User flagging. Any authenticated user can flag a post with a reason (spam, rule_breaking, off_topic) and optional detail. Flags are stored in `post_flags` and visible on the moderation dashboard.
80
81	Auto-hide. Each community has a configurable `auto_hide_threshold` (nullable). When a flag is inserted and the threshold is set, `auto_hide_if_threshold_met()` atomically counts distinct flaggers on that post and sets `removed_at` + `removed_by` if the count meets or exceeds the threshold. This is logged as `AutoHidePost` in the mod log.
82
83	Mod-remove. Moderators and owners can directly remove posts via `/posts/{post_id}/remove` or through the flag queue. Removing via the flag queue also resolves all pending flags on that post. Both paths log the action to the mod log.
84
85	Bans and mutes. Moderators can ban (full access revocation) or mute (write-only restriction) users within their community, with optional duration and reason. Role hierarchy is enforced: mods cannot ban other mods, only owners can. Expired bans are cleaned up opportunistically when the moderation page loads.
86
87	All moderation actions are recorded in the `mod_log` table with actor, action type, target user/post, and optional reason. The mod log is paginated and visible to moderators.
88
89	### Thread tracking
90
91	Users can track threads to monitor new activity. The tracked threads page shows unread counts (posts since last visit) and @mention indicators. Tracking is opt-in per thread and can be bulk-cleared.
92
93	## 4. Authentication
94
95	MT has no user database of its own in the traditional sense. All authentication flows through MNW's OAuth 2.0 server with PKCE (Proof Key for Code Exchange).
96
97	### OAuth flow
98
99	1. `/auth/login` generates a PKCE verifier (32 random bytes, base64url-encoded) and challenge (SHA-256 of verifier), stores the verifier in the session, and redirects to `MNW_BASE_URL/oauth/authorize`.
100	2. MNW authenticates the user and redirects back to `/auth/callback` with an authorization code and state nonce.
101	3. Callback validates the state nonce, exchanges the code for an access token (POST to `/oauth/token` with the PKCE verifier), and fetches `/oauth/userinfo` with the token.
102	4. The user is upserted locally (`ON CONFLICT (mnw_account_id) DO UPDATE`), suspension status is checked, and a session is created with `user_id`, `username`, and `display_name`.
103	5. Session ID is cycled after login to prevent session fixation.
104
105	Token exchange and userinfo fetch both retry up to 2 times on 5xx or network errors with exponential backoff (500ms, 1000ms).
106
107	### Extractors
108
109	- `MaybeUser(Option<SessionUser>)` -- infallible, used on all routes. Returns `None` for anonymous users.
110	- `PlatformAdmin(SessionUser)` -- returns 404 (not 403) to non-admins, hiding admin routes entirely.
111
112	### Internal API authentication
113
114	MNW-to-MT requests (community creation, thread cross-posting) bypass OAuth and use HMAC-SHA256:
115
116	- `X-Internal-Timestamp` -- Unix timestamp, rejected if >60 seconds from server time
117	- `X-Internal-Signature` -- HMAC-SHA256 of `"timestamp\nbody"` using a shared secret
118
119	The `InternalAuth` extractor validates both before passing the request body to the handler. Constant-time comparison prevents timing attacks on the signature.
120
121	## 5. Session Storage
122
123	Sessions are stored in PostgreSQL via `tower-sessions-sqlx-store`. There is no Redis. Key details:
124
125	- Cookie name: `mt_session`
126	- SameSite: Lax
127	- Expiry: 7 days of inactivity
128	- Expired sessions are cleaned up hourly by a background task (`continuously_delete_expired`)
129	- Session data stored: `user_id` (UUID), `username`, `display_name`, plus transient OAuth state (PKCE verifier, state nonce) during login
130
131	## 6. Rate Limiting
132
133	Write endpoints (all POST routes) are rate-limited per IP using `tower_governor`:
134
135	- Burst: 10 requests
136	- Sustained: 2 requests/second (one token per 500ms)
137	- Key extractor: `SmartIpKeyExtractor` (handles X-Forwarded-For behind reverse proxy)
138
139	Rate limiting is applied as a route layer on the write routes group only. Read routes have no rate limit. The internal API is also exempt (it uses HMAC auth, not sessions).
140
141	## 7. Key Design Decisions
142
143	### HTMX-based SSR, no SPA
144
145	MT serves complete HTML pages with HTMX for progressive enhancement (search results as swapped fragments). This eliminates client-side state management, reduces JavaScript to near zero, and makes the forum functional without JS. The search endpoint (`/search`) returns HTML fragments for HTMX swap, not JSON.
146
147	### Community-scoped permissions
148
149	All permissions (roles, bans, mutes) are scoped to a single community. A user can be an owner in one community, a banned user in another, and a regular member in a third. There is no global moderator role -- only the platform admin (a single user ID set via env var) has cross-community authority.
150
151	### Immutable post bodies with footnotes and endorsements
152
153	Post bodies cannot be edited after creation (enforced since migration 011). Authors can append footnotes (corrections, clarifications) and other users can endorse posts. This preserves conversation integrity while allowing authors to add context.
154
155	### Soft delete everywhere
156
157	Threads and posts use `deleted_at` timestamps rather than hard deletes. Moderator removals use a separate `removed_at` / `removed_by` pair to distinguish author deletions from mod actions. This supports audit trails and potential appeals.
158
159	### Internal API for cross-service coordination
160
161	Rather than sharing a database between MNW and MT, the two services communicate via a signed internal API. MNW can create communities, post threads (e.g., release notes), and query thread stats. This keeps the services independently deployable and the databases isolated.
162
163	### Security headers
164
165	Every response includes: Content-Security-Policy (default-src 'self', no frame-ancestors), X-Content-Type-Options (nosniff), X-Frame-Options (DENY), and Cache-Control (private, no-cache by default). CSP is strict -- no inline scripts, no external resources.
166
167	## 8. Scaling Considerations
168
169	### Thread listing
170
171	Thread lists are sorted by `last_activity_at` (or reply count), with pinned threads first. The `last_activity_at` column is denormalized on the threads table and updated on every new post, avoiding a JOIN/subquery on every listing page load. Pagination uses LIMIT/OFFSET with 25 threads per page.
172
173	### Search indexing
174
175	Search uses a two-layer approach:
176
177	- Full-text search: PostgreSQL `tsvector` columns (generated, stored) on `threads.title` and `posts.body_markdown`, with GIN indexes. Queries use `websearch_to_tsquery` for natural language input.
178	- Fuzzy matching: `pg_trgm` extension with GIN trigram indexes on thread titles and post bodies. Combined with full-text ranking (`ts_rank * 2.0 + similarity`) to blend exact and fuzzy results.
179
180	Search queries union thread title matches with post body matches (deduplicated), ordered by combined rank, capped at 20 results. Community-scoped search is supported via an optional `scope` parameter.
181
182	### Image uploads
183
184	Images are stored in S3 (not in PostgreSQL), with metadata tracked in the database. Uploads are validated for type (png, jpg, gif, webp), size (5 MB max), and extension/content-type consistency. JPEG EXIF metadata is stripped server-side before upload. Image keys use the format `mt/{community_slug}/{uuid}.{ext}`.
185
186	### Connection pooling
187
188	MT uses sqlx's built-in connection pool (`PgPool`). Sessions, forum data, and search all share the same pool. The session store has its own cleanup task but no separate connection pool.
189