1.2 KiB
1.2 KiB
Ops / runbooks
Deep dives
This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
First 30 minutes (incident checklist)
-
Confirm impact
- What’s broken: UI, API, auth, or gateway integration?
- All users or a subset?
-
Check service health
- Backend:
/healthzand/readyz - Frontend: can it load? does it reach the API?
- Backend:
-
Check auth (Clerk)
- Frontend: did Clerk get enabled unintentionally? (publishable key set)
- Backend: is
CLERK_SECRET_KEYconfigured correctly?
-
Check DB connectivity
- Can backend connect to Postgres (
DATABASE_URL)?
- Can backend connect to Postgres (
-
Check logs
- Backend logs for 5xx spikes or auth failures.
- Frontend logs for API URL/proxy misconfig.
-
Stabilize
- Roll back the last change if you can.
- Temporarily disable optional integrations (gateway) to isolate.
Backups / restore
See Production. If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.