1.3 KiB
1.3 KiB
Ops / runbooks
This page is the operator/SRE entry point. It intentionally links to existing deeper docs to minimize churn.
Where to start
- Deployment: docs/deployment/README.md
- Production checklist/notes: docs/production/README.md
- Troubleshooting: docs/troubleshooting/README.md
“First 30 minutes” incident checklist
-
Confirm user impact + scope
- What is broken: UI, API, auth, or gateway integration?
- Is it all users or a subset?
-
Check service health
- Backend:
/healthzand/readyz - Frontend: can it load? does it reach the API?
- Backend:
-
Check auth (Clerk) configuration
- Frontend: is Clerk enabled unexpectedly? (publishable key set)
- Backend: is
CLERK_JWKS_URLconfigured correctly?
-
Check DB connectivity
- Can backend connect to Postgres (
DATABASE_URL)?
- Can backend connect to Postgres (
-
Check logs
- Backend logs for 5xx spikes or auth failures.
- Frontend logs for proxy/API URL misconfig.
-
Stabilize
- Roll back the last change if available.
- Temporarily disable optional integrations (gateway) to isolate.
Backups / restore (placeholder)
- Define backup cadence and restore steps once production deployment is finalized.