2026-02-11 06:15:54 +00:00
|
|
|
|
# Ops / runbooks
|
|
|
|
|
|
|
2026-02-11 06:30:08 +00:00
|
|
|
|
## Deep dives
|
|
|
|
|
|
|
|
|
|
|
|
- [Deployment](deployment/README.md)
|
|
|
|
|
|
- [Production](production/README.md)
|
|
|
|
|
|
- [Troubleshooting](troubleshooting/README.md)
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
2026-02-11 09:45:29 +00:00
|
|
|
|
This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
2026-02-11 09:45:29 +00:00
|
|
|
|
## First 30 minutes (incident checklist)
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
2026-02-11 09:45:29 +00:00
|
|
|
|
1. **Confirm impact**
|
|
|
|
|
|
- What’s broken: UI, API, auth, or gateway integration?
|
|
|
|
|
|
- All users or a subset?
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
|
|
|
|
|
2. **Check service health**
|
|
|
|
|
|
- Backend: `/healthz` and `/readyz`
|
|
|
|
|
|
- Frontend: can it load? does it reach the API?
|
|
|
|
|
|
|
2026-02-11 09:45:29 +00:00
|
|
|
|
3. **Check auth (Clerk)**
|
|
|
|
|
|
- Frontend: did Clerk get enabled unintentionally? (publishable key set)
|
|
|
|
|
|
- Backend: is `CLERK_SECRET_KEY` configured correctly?
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
|
|
|
|
|
4. **Check DB connectivity**
|
|
|
|
|
|
- Can backend connect to Postgres (`DATABASE_URL`)?
|
|
|
|
|
|
|
|
|
|
|
|
5. **Check logs**
|
|
|
|
|
|
- Backend logs for 5xx spikes or auth failures.
|
2026-02-11 09:45:29 +00:00
|
|
|
|
- Frontend logs for API URL/proxy misconfig.
|
2026-02-11 06:15:54 +00:00
|
|
|
|
|
|
|
|
|
|
6. **Stabilize**
|
2026-02-11 09:45:29 +00:00
|
|
|
|
- Roll back the last change if you can.
|
2026-02-11 06:15:54 +00:00
|
|
|
|
- Temporarily disable optional integrations (gateway) to isolate.
|
|
|
|
|
|
|
2026-02-11 09:45:29 +00:00
|
|
|
|
## Backups / restore
|
|
|
|
|
|
|
|
|
|
|
|
See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.
|