Files
openclaw-mission-control/docs/09-ops-runbooks.md

39 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ops / runbooks
## Deep dives
- [Deployment](deployment/README.md)
- [Production](production/README.md)
- [Troubleshooting](troubleshooting/README.md)
This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
## First 30 minutes (incident checklist)
1. **Confirm impact**
- Whats broken: UI, API, auth, or gateway integration?
- All users or a subset?
2. **Check service health**
- Backend: `/healthz` and `/readyz`
- Frontend: can it load? does it reach the API?
3. **Check auth (Clerk)**
- Frontend: did Clerk get enabled unintentionally? (publishable key set)
- Backend: is `CLERK_SECRET_KEY` configured correctly?
4. **Check DB connectivity**
- Can backend connect to Postgres (`DATABASE_URL`)?
5. **Check logs**
- Backend logs for 5xx spikes or auth failures.
- Frontend logs for API URL/proxy misconfig.
6. **Stabilize**
- Roll back the last change if you can.
- Temporarily disable optional integrations (gateway) to isolate.
## Backups / restore
See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.