Skip to main content

max / makenotwork

7.1 KB · 179 lines History Blame Raw
1 # Troubleshooting — MNW Server
2
3 ## Service Won't Start
4
5 ```
6 Check logs: journalctl -u makenotwork -n 50 --no-pager
7 ```
8
9 | Symptom | Cause | Fix |
10 |---------|-------|-----|
11 | "DATABASE_URL environment variable is required" | Missing env var | Check `/opt/makenotwork/.env` has `DATABASE_URL` |
12 | "SIGNING_SECRET is required in production" | HOST=0.0.0.0 or HTTPS HOST_URL without secret | Set `SIGNING_SECRET` to a random string in `.env` |
13 | "Invalid HOST address" / "Invalid PORT number" | Malformed HOST or PORT | HOST must be valid IP (default 127.0.0.1), PORT must be integer (default 3000) |
14 | Startup hangs then fails | PostgreSQL unreachable | `systemctl status postgresql`, verify `DATABASE_URL` connection string |
15 | "Failed to run migrations" | Migration error or DB permissions | Connect manually: `psql $DATABASE_URL -c "SELECT 1"`, check migration SQL |
16 | "Failed to migrate session store" | tower_sessions table issue | Usually resolves on retry. If persistent, check DB user has CREATE TABLE permission |
17 | Startup succeeds but features missing | Optional services not configured | Stripe, Postmark, S3, Git browser, SyncKit all degrade gracefully if env vars missing |
18
19 ## 502 Errors
20
21 Caddy serves `/opt/makenotwork/error-pages/502.html` when the app is unreachable.
22
23 1. **Is the process running?**
24 ```bash
25 systemctl status makenotwork --no-pager
26 ```
27 - Not running → `systemctl restart makenotwork`, check logs for crash cause
28 - Running but not responding → check port: `curl -s http://127.0.0.1:3000`
29
30 2. **Is Caddy running?**
31 ```bash
32 systemctl status caddy --no-pager
33 ```
34 - Not running → `systemctl restart caddy`
35
36 3. **Is PostgreSQL running?**
37 ```bash
38 systemctl status postgresql --no-pager
39 sudo -u makenotwork psql makenotwork -c "SELECT 1"
40 ```
41 - Not running → `systemctl restart postgresql`, then `systemctl restart makenotwork`
42
43 4. **Port conflict?**
44 ```bash
45 lsof -i :3000
46 ```
47 - Another process → kill it or change `PORT` in `.env`
48
49 ## Slow Queries
50
51 **Symptoms:** Pages load slowly, "timeout acquiring connection" in logs, high PostgreSQL CPU.
52
53 **Diagnostics:**
54 ```bash
55 # Enable slow query logging
56 sudo -u postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;"
57 sudo -u postgres psql -c "SELECT pg_reload_conf();"
58
59 # Check active queries
60 sudo -u postgres psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 5;"
61 ```
62
63 **Known patterns:**
64 - Discover search with very short terms → triggers trigram scan. The `pg_trgm` extension + GIN index mitigate this.
65 - Tag hierarchy queries → EXISTS subqueries on items with many tags.
66 - Connection pool exhaustion → default is 25 connections, 3s acquire timeout. If all busy, new requests fail after 3s.
67
68 ## Stripe Webhook Failures
69
70 **Symptoms:** Purchases not completing, subscriptions not updating.
71
72 1. **Check Stripe Dashboard → Webhooks** for failed deliveries
73 2. **Check server logs:**
74 ```bash
75 journalctl -u makenotwork --since "1 hour ago" | grep -i stripe
76 ```
77
78 | Log Message | Cause | Fix |
79 |-------------|-------|-----|
80 | "Missing Stripe signature" | Request missing `Stripe-Signature` header | Webhook URL misconfigured in Stripe Dashboard |
81 | "Invalid payload encoding" | Non-UTF8 body | Stripe endpoint URL wrong (hitting wrong service) |
82 | Signature verification error | `STRIPE_WEBHOOK_SECRET` mismatch | Copy exact secret from Stripe Dashboard → Webhooks, update `.env`, restart |
83 | Event type not handled (debug log) | Unhandled event type | Expected — only specific events are processed |
84 | "Stripe not configured" | Missing `STRIPE_SECRET_KEY` | Set env var in `.env`, restart |
85
86 **Test locally:**
87 ```bash
88 stripe listen --forward-to localhost:3000/stripe/webhook
89 stripe trigger checkout.session.completed
90 ```
91
92 ## Email Not Sending
93
94 **Symptoms:** Password resets, purchase receipts, or verification emails not arriving.
95
96 1. **Is Postmark configured?**
97 - Check `.env` for `POSTMARK_TOKEN`. If missing, emails log to stdout (dev mode).
98
99 2. **Is the recipient suppressed?**
100 ```sql
101 SELECT * FROM email_suppressions WHERE email = 'user@example.com';
102 ```
103 - If found, remove: `DELETE FROM email_suppressions WHERE email = 'user@example.com';`
104
105 3. **Check Postmark Dashboard → Activity** for delivery status
106
107 4. **Check server logs:**
108 ```bash
109 journalctl -u makenotwork --since "1 hour ago" | grep -i email
110 ```
111
112 | Log Message | Cause | Fix |
113 |-------------|-------|-----|
114 | "email skipped (suppressed)" | Recipient on suppression list | Remove from `email_suppressions` table |
115 | "Failed to send email" | Postmark API error (timeout, auth, invalid address) | Check Postmark Dashboard for details, verify token |
116 | Emails logged to console | `POSTMARK_TOKEN` not set | Set env var, restart |
117
118 ## Sync Failures (SyncKit)
119
120 **Symptoms:** Desktop apps can't push/pull data.
121
122 1. **Is SyncKit configured?**
123 - Check `.env` for `SYNCKIT_JWT_SECRET`. If missing, endpoints return 503.
124
125 2. **JWT issues:**
126
127 | Error | Cause | Fix |
128 |-------|-------|-----|
129 | 401 Unauthorized | Token expired (7-day max) or bad signature | Client should re-authenticate via `/api/synckit/auth` |
130 | "Unknown app" | API key invalid or app inactive | Check `sync_apps` table: `SELECT * FROM sync_apps WHERE api_key = '...'` |
131 | "Unknown device" | Device not registered | Client should call `POST /api/sync/devices` first |
132
133 3. **Push failures:**
134
135 | Error | Cause | Fix |
136 |-------|-------|-----|
137 | "Maximum 500 changes per push" | Batch too large | Client should split into ≤500-change batches |
138 | "Table name validation failed" | Invalid chars in table name | Use alphanumeric + underscores only, max 100 chars |
139 | "DELETE operations should not include data" | Data payload on DELETE op | Client bug — set `data: null` for DELETEs |
140
141 4. **Blob storage:**
142 - Check `SYNCKIT_S3_*` env vars for the separate SyncKit bucket
143 - If S3 unreachable, blob up/download fails but changelog sync still works
144
145 ## Git Browser Errors
146
147 **Symptoms:** Source browser pages return 404 or 500.
148
149 1. **Is the git browser configured?**
150 - Check `.env` for `GIT_REPOS_PATH`. If missing, all git routes return 404.
151
152 2. **Repository not found:**
153 ```bash
154 ls /opt/git/ # Check bare repos exist
155 ```
156 - Repos must be bare (`git init --bare`)
157 - Path structure: `$GIT_REPOS_PATH/{owner}/{repo}.git/`
158
159 3. **File too large (>1MB):** Intentional limit. Large files show truncation message.
160
161 4. **Repo corruption:**
162 ```bash
163 cd /opt/git/owner/repo.git && git fsck --full
164 ```
165
166 ## Resource Limits
167
168 | Resource | Limit | What Happens |
169 |----------|-------|-------------|
170 | DB connections | 25 max | "timeout acquiring connection" after 3s wait |
171 | Memory | 512M (systemd MemoryMax) | Process killed by OOM, auto-restarts |
172 | File descriptors | 65535 (LimitNOFILE) | "too many open files" |
173 | File upload: audio | 500 MB | 413 Payload Too Large |
174 | File upload: image | 10 MB | 413 Payload Too Large |
175 | File upload: video | 20 GB | 413 Payload Too Large |
176 | Login rate limit | 2/sec, burst 5 | 429 Too Many Requests |
177 | API rate limit | 2/sec, burst 10 | 429 Too Many Requests |
178 | SyncKit rate limit | 10/sec, burst 30 | 429 Too Many Requests |
179