max / makenotwork
| 1 | -- Record when a tier is left in a partial / mixed-version state. |
| 2 | -- |
| 3 | -- A canary promote that fails mid-rollout tries to roll the touched nodes |
| 4 | -- back to the prior version (CF3 canary rollback). That compensation is |
| 5 | -- best-effort: a node whose rollback also fails, or a first-deploy failure |
| 6 | -- with no prior version to restore, leaves the tier genuinely inconsistent — |
| 7 | -- some nodes on the new (broken) version, some on the old. The rollback |
| 8 | -- endpoint has the same exposure if a node deploy fails partway through the |
| 9 | -- fleet. |
| 10 | -- |
| 11 | -- Before this column that condition was invisible: `tier_state` still showed |
| 12 | -- a single `current_version` (never advanced on failure), so `/state` and the |
| 13 | -- TUI reported a clean tier while the fleet was split-brain — the operator |
| 14 | -- only learned of it by reading `deploys` rows or SSHing the nodes. |
| 15 | -- |
| 16 | -- `partial_reason` is NULL when the tier is consistent and a human-readable |
| 17 | -- explanation when it is not. It is set by the promote/rollback failure paths |
| 18 | -- and cleared by any subsequent clean full promote or rollback of the tier. |
| 19 | tier_state ADD COLUMN partial_reason TEXT; |
| 20 |