diff --git a/docs/01-overview.md b/docs/01-overview.md index b776fefe..b4e6c046 100644 --- a/docs/01-overview.md +++ b/docs/01-overview.md @@ -1,45 +1,43 @@ -# Overview +# Mission Control Mission Control is the **web UI + HTTP API** for operating OpenClaw. -It’s where you manage **boards**, **tasks**, **agents**, **approvals**, and (optionally) **gateway connections**. +It’s the place you go to coordinate work across people and agents, keep an evidence trail, and operate the system safely. -## Problem statement +## What problem it solves -OpenClaw can execute work (tools/skills) and converse across channels, but real operations need a place to: +OpenClaw can run tools/skills and hold conversations across channels. What’s missing in practice is a control plane that makes this operational: -- **Coordinate** work across people + agents (what’s next, what’s blocked, who owns what) -- **Track evidence** of what happened (commands run, links, logs, artifacts) -- **Control risk** (approvals, guardrails, isolation) -- **Operate reliably** (deployment, configuration, troubleshooting) +- **Coordination**: boards + tasks make it explicit what’s being worked on, by whom, and what’s blocked. +- **Evidence**: task comments capture commands run, links, outputs, and decisions. +- **Risk control**: approvals provide a structured “allow/deny” gate for sensitive actions. +- **Operations**: deployment, configuration, and troubleshooting live in one navigable docs spine. -Mission Control provides that control plane. - -## Who uses it - -- **Maintainers / operators**: keep Mission Control + gateways healthy, deploy upgrades, respond to incidents. -- **Contributors**: develop backend/frontend changes, run tests, ship docs. -- **Automation authors**: define agent identities, skills, and task workflows. - -## Key concepts (glossary-lite) +## Core concepts - **Board**: a workspace containing tasks, memory, and agents. -- **Task**: a unit of work on a board (status + comments/evidence). -- **Agent**: an automated worker that can execute tasks and post evidence. -- **Approval**: a structured “allow/deny” checkpoint for risky actions. -- **Gateway**: the OpenClaw runtime host that executes tools/skills and runs heartbeats/cron. -- **Heartbeat**: periodic agent check-in loop for incremental work. -- **Cron job**: scheduled execution (recurring or one-shot), often isolated from conversational context. +- **Task**: a unit of work with a status and evidence (comments). +- **Agent**: an automated worker that executes tasks and posts evidence. +- **Approval**: a review gate for risky steps. +- **Gateway** (optional integration): an OpenClaw runtime host Mission Control can coordinate with. +- **Heartbeat**: periodic agent loop for incremental work. +- **Cron**: scheduled execution (recurring or one-shot). -## Out of scope +## What it is not -- Not a general-purpose project management suite (we optimize for AI-assisted operations, not every PM feature). -- Not a full observability platform (we integrate with logs/metrics rather than replacing them). -- Not a secrets manager (we reference secret sources; don’t store secrets in docs/tasks/comments). +- A general-purpose project management tool. +- An observability suite (use your existing logs/metrics/tracing; Mission Control links and operationalizes them). +- A secrets manager (keep secrets in your secret store; don’t paste them into tasks/docs). -## Where to go next +## How to navigate these docs -- Want it running? → [Quickstart](02-quickstart.md) -- Want to contribute? → [Development](03-development.md) -- Want to understand internals? → [Architecture](05-architecture.md) -- Operating it? → [Ops / runbooks](09-ops-runbooks.md) +This repo keeps a small “reader journey” spine under `docs/`: + +1. [Quickstart](02-quickstart.md) — run it locally/self-host. +2. [Development](03-development.md) — contributor workflow and CI parity. +3. [Configuration](06-configuration.md) — env vars, precedence, migrations, CORS. +4. [API reference](07-api-reference.md) — route groups + auth model. +5. [Ops / runbooks](09-ops-runbooks.md) — operational checklists. +6. [Troubleshooting](10-troubleshooting.md) — symptom → checks → fixes. + +For deeper references, see `docs/architecture/`, `docs/deployment/`, `docs/production/`, `docs/testing/`, and `docs/troubleshooting/`. diff --git a/docs/02-quickstart.md b/docs/02-quickstart.md index f49407d1..d299af3a 100644 --- a/docs/02-quickstart.md +++ b/docs/02-quickstart.md @@ -1,14 +1,61 @@ -# Quickstart (self-host with Docker Compose) +# Quickstart (Docker Compose) -This page is a pointer to the canonical quickstart in the repo root README. +This is the fastest way to run Mission Control locally or on a single host. -- Canonical quickstart: [`README.md#quick-start-self-host-with-docker-compose`](../README.md#quick-start-self-host-with-docker-compose) +## What you get -## Verify it works -After `docker compose up`: -- Backend health: `http://localhost:8000/healthz` returns `{ "ok": true }` -- Frontend: `http://localhost:3000` +From `compose.yml` you get three services: + +- Postgres (`db`) +- FastAPI backend (`backend`) on `http://localhost:8000` +- Next.js frontend (`frontend`) on `http://localhost:3000` + +## Prerequisites + +- Docker + Docker Compose v2 (`docker compose`) + +## Run + +From repo root: + +```bash +cp .env.example .env + +docker compose -f compose.yml --env-file .env up -d --build +``` + +Open: +- UI: http://localhost:3000 +- Backend health: http://localhost:8000/healthz + +## Verify + +```bash +curl -f http://localhost:8000/healthz +curl -I http://localhost:3000/ +``` ## Common gotchas -- `NEXT_PUBLIC_API_URL` must be reachable from your browser (host), not just from inside Docker. -- Clerk auth is required; ensure Clerk keys are configured (see [Deployment guide](deployment/README.md)). + +- `NEXT_PUBLIC_API_URL` must be reachable from your **browser**. + - If it’s missing/blank/wrong, the UI may load but API calls will fail (e.g. Activity feed blank). +- If you are running locally without Clerk: + - keep `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` unset/blank so Clerk stays gated off in the frontend. + +## Useful commands + +```bash +# tail logs +docker compose -f compose.yml --env-file .env logs -f --tail=200 + +# stop (keeps data) +docker compose -f compose.yml --env-file .env down + +# reset data (DESTRUCTIVE) +docker compose -f compose.yml --env-file .env down -v +``` + +## Next + +- Want a faster contributor loop? See [Development](03-development.md) (DB via Compose, backend+frontend in dev mode). +- Need to change env vars/migrations/CORS? See [Configuration](06-configuration.md). diff --git a/docs/03-development.md b/docs/03-development.md index 4a5f785f..33e7ccd6 100644 --- a/docs/03-development.md +++ b/docs/03-development.md @@ -1,27 +1,23 @@ # Development -## Deep dives - -- [Testing guide](testing/README.md) -- [Troubleshooting deep dive](troubleshooting/README.md) - -How we develop Mission Control locally, with a workflow that stays close to CI. +This page is the contributor workflow for Mission Control. +It’s intentionally **CI-aligned**: if you can run these commands locally, you should not be surprised by CI. ## Prerequisites - Docker + Docker Compose v2 (`docker compose`) -- Python **3.12+** + `uv` +- Python **3.12+** + [`uv`](https://github.com/astral-sh/uv) - Node.js + npm - - CI pins **Node 20** via GitHub Actions (`actions/setup-node@v4` with `node-version: "20"`). + - CI pins **Node 20** via `.github/workflows/ci.yml` (`actions/setup-node@v4`, `node-version: "20"`). -## Repo structure (where to run commands) +## Repo layout -- Repo root: `Makefile` contains canonical targets. -- Backend code: `backend/` (FastAPI) -- Frontend code: `frontend/` (Next.js) +- Backend: `backend/` (FastAPI) +- Frontend: `frontend/` (Next.js) +- Canonical commands: `Makefile` -## “One command” setup +## Setup (one command) From repo root: @@ -29,104 +25,23 @@ From repo root: make setup ``` -What it does: -- Syncs backend deps with `uv`. -- Syncs frontend deps with `npm` via the node wrapper. +What it does (evidence: `Makefile`): +- `make backend-sync`: `cd backend && uv sync --extra dev` +- `make frontend-sync`: verifies node tooling (`scripts/with_node.sh --check`), then `npm install` in `frontend/` -## Canonical checks (CI parity) +## Run the stack (two recommended loops) -### Run everything locally (closest to CI) +### Loop A (recommended): DB via Compose, app in dev mode -From repo root: - -```bash -make check -``` - -CI runs two jobs: -- `check` (lint/typecheck/tests/coverage/build) -- `e2e` (Cypress) - -## Backend workflow - -### Install/sync deps - -```bash -cd backend -uv sync --extra dev -``` - -### Run the API (dev) - -```bash -cd backend -cp .env.example .env -uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -``` - -### Backend checks - -From repo root: - -```bash -make backend-lint # flake8 -make backend-typecheck # mypy --strict -make backend-test # pytest -make backend-coverage # pytest + scoped coverage gate -``` - -### DB migrations - -From repo root: - -```bash -make backend-migrate -``` - -## Frontend workflow - -### Install deps - -```bash -cd frontend -npm install -``` - -(or from repo root: `make frontend-sync`) - -### Run the UI (dev) - -```bash -cd frontend -cp .env.example .env.local -# Ensure NEXT_PUBLIC_API_URL is correct for the browser: -# NEXT_PUBLIC_API_URL=http://localhost:8000 -npm run dev -``` - -### Frontend checks - -From repo root: - -```bash -make frontend-lint # eslint -make frontend-typecheck # tsc -make frontend-test # vitest -make frontend-build # next build -``` - -## Local dev loops - -### Loop A (recommended): DB via Compose, backend + frontend in dev mode - -1) Start Postgres only: +1) Start Postgres: ```bash cp .env.example .env + docker compose -f compose.yml --env-file .env up -d db ``` -2) Backend (local): +2) Backend dev server: ```bash cd backend @@ -135,16 +50,18 @@ uv sync --extra dev uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` -3) Frontend (local): +3) Frontend dev server: ```bash cd frontend cp .env.example .env.local +# ensure this is correct for the browser: +# NEXT_PUBLIC_API_URL=http://localhost:8000 npm install npm run dev ``` -### Loop B: all-in-one Docker Compose +### Loop B: all-in-one Compose ```bash cp .env.example .env @@ -152,31 +69,53 @@ cp .env.example .env docker compose -f compose.yml --env-file .env up -d --build ``` -Useful ops: +## Checks (CI parity) + +### Run everything ```bash -docker compose -f compose.yml --env-file .env logs -f --tail=200 -docker compose -f compose.yml --env-file .env up -d --build backend -# destructive reset (drops Postgres volume): -docker compose -f compose.yml --env-file .env down -v +make check ``` -## Cypress E2E workflow (high level) +Evidence: `Makefile`. -See the deep dive: [docs/testing/README.md](testing/README.md). +### Common targeted commands -Notes: -- E2E uses Clerk (official `@clerk/testing` integration); CI injects Clerk env vars. +Backend: +```bash +make backend-lint # flake8 +make backend-typecheck # mypy --strict +make backend-test # pytest +make backend-coverage # pytest + scoped 100% coverage gate +make backend-migrate # alembic upgrade head +``` -## Tooling notes +Frontend: +```bash +make frontend-lint # eslint +make frontend-typecheck # tsc +make frontend-test # vitest +make frontend-build # next build +``` -### Node wrapper (`scripts/with_node.sh`) +## Cypress E2E -Many Make targets run frontend commands via `bash scripts/with_node.sh`. -It checks `node`/`npm`/`npx` and can use `nvm` if present. +Evidence: `docs/testing/README.md`, `.github/workflows/ci.yml`. -## Quick troubleshooting +- E2E uses Clerk’s official Cypress integration (`@clerk/testing`). +- Local run pattern: -- UI loads but API calls fail / activity feed blank: - - confirm `NEXT_PUBLIC_API_URL` is set and browser-reachable. - - see [Troubleshooting](troubleshooting/README.md). +```bash +# terminal 1 +cd frontend +npm run dev -- --hostname 0.0.0.0 --port 3000 + +# terminal 2 +cd frontend +npm run e2e -- --browser chrome +``` + +## Deep dives + +- [Testing guide](testing/README.md) +- [Troubleshooting deep dive](troubleshooting/README.md) diff --git a/docs/06-configuration.md b/docs/06-configuration.md index 855f0241..35237f96 100644 --- a/docs/06-configuration.md +++ b/docs/06-configuration.md @@ -1,105 +1,111 @@ # Configuration -This page documents how Mission Control is configured across local dev, self-host, and production. +This page documents **where configuration comes from**, the key **environment variables**, and a couple operational footguns (migrations, CORS). -## Deep dives +For deployment/production patterns, see: +- [Deployment](deployment/README.md) +- [Production](production/README.md) -- Deployment: [docs/deployment/README.md](deployment/README.md) -- Production notes: [docs/production/README.md](production/README.md) +## Configuration sources & precedence -## Config sources & precedence +Mission Control is a 3-service stack (`compose.yml`): Postgres (`db`), backend (`backend`), frontend (`frontend`). -Mission Control is a 3-service stack (`compose.yml`): Postgres (`db`), FastAPI backend (`backend`), and Next.js frontend (`frontend`). Configuration comes from a mix of **compose env files**, **service env vars**, and **app-specific env files**. +### Docker Compose (recommended for local/self-host) -### Docker Compose (recommended local/self-host) +Common pattern: -Precedence (highest → lowest): +```bash +cp .env.example .env -1) **Explicit runtime environment** passed to Compose -- `docker compose ... -e NAME=value` (or exported in your shell) +docker compose -f compose.yml --env-file .env up -d --build +``` -2) **Compose env-file** used for interpolation -- `docker compose -f compose.yml --env-file .env up ...` -- Suggested workflow: copy repo root `.env.example` → `.env` and edit. +Precedence (high → low): -3) **Compose defaults** embedded in `compose.yml` -- e.g. `${BACKEND_PORT:-8000}`. +1. Environment exported in your shell (or `-e NAME=value`) +2. Compose `--env-file .env` (variable interpolation) +3. Defaults in `compose.yml` (e.g. `${BACKEND_PORT:-8000}`) +4. Backend defaults via `env_file: ./backend/.env.example` +5. Frontend optional user-managed `frontend/.env` -4) **Backend container env** -- `compose.yml` sets backend `env_file: ./backend/.env.example` (defaults) -- plus overrides in `compose.yml: services.backend.environment`. +> Note: Compose intentionally does **not** load `frontend/.env.example` to avoid placeholder Clerk keys accidentally enabling Clerk. -5) **Frontend container env** -- `compose.yml` sets `NEXT_PUBLIC_API_URL` via `environment:` and also as a **build arg**. -- `compose.yml` optionally loads `frontend/.env` (user-managed), *not* `frontend/.env.example`. +### Backend env-file loading (non-Compose) -### Backend env-file loading behavior (non-Compose) +Evidence: `backend/app/core/config.py`. -When running the backend directly (e.g., `uvicorn`), settings load from env vars and from these files: +When running the backend directly (uvicorn), settings are loaded from: - `backend/.env` (always attempted) - `.env` (repo root; optional) +- plus process env vars -This is intentional so running from repo root still picks up backend config. +## Environment variables (grouped) -### Frontend env-file behavior (non-Compose) +### Root `.env` (Compose-level) -- Next.js uses `NEXT_PUBLIC_*` variables for browser-visible configuration. -- For local dev you typically create `frontend/.env.local` (Next.js convention) or `frontend/.env` (if you want Compose to read it). +Template: `.env.example`. -## Environment variables +- Ports: `FRONTEND_PORT`, `BACKEND_PORT`, `POSTGRES_PORT` +- Postgres defaults: `POSTGRES_DB`, `POSTGRES_USER`, `POSTGRES_PASSWORD` +- Backend knobs: `CORS_ORIGINS`, `DB_AUTO_MIGRATE` +- Frontend: `NEXT_PUBLIC_API_URL` (required) -This table is based on `backend/app/core/config.py`, `.env.example`, `backend/.env.example`, `frontend/.env.example`, and `compose.yml`. +### Backend -### Compose / shared (repo root `.env`) +Template: `backend/.env.example` + settings model `backend/app/core/config.py`. -| Variable | Used by | Purpose | Default / example | Footguns | -|---|---|---|---|---| -| `FRONTEND_PORT` | compose | Host port for frontend container | `3000` | Port conflicts on host are common | -| `BACKEND_PORT` | compose | Host port for backend container | `8000` | If changed, ensure frontend points at the new port | -| `POSTGRES_DB` | db/compose | Postgres database name | `mission_control` | Changing requires new DB or migration plan | -| `POSTGRES_USER` | db/compose | Postgres user | `postgres` | — | -| `POSTGRES_PASSWORD` | db/compose | Postgres password | `postgres` | Don’t use defaults in real deployments | -| `POSTGRES_PORT` | compose | Host port for Postgres | `5432` | Port conflicts on host are common | -| `CORS_ORIGINS` | backend/compose | Backend CORS allowlist | `http://localhost:3000` | Must include the real frontend origin | -| `DB_AUTO_MIGRATE` | backend/compose | Auto-run Alembic migrations at backend startup | `true` (in `.env.example`) | Can be risky in prod; see notes below | -| `NEXT_PUBLIC_API_URL` | frontend (build+runtime) | Browser-reachable backend URL | `http://localhost:8000` | Must be reachable from the **browser**, not just Docker | +- `ENVIRONMENT` +- `LOG_LEVEL` +- `DATABASE_URL` +- `CORS_ORIGINS` +- `DB_AUTO_MIGRATE` -### Backend (FastAPI) +Clerk: +- `CLERK_SECRET_KEY` (required; backend enforces non-empty) +- `CLERK_API_URL`, `CLERK_VERIFY_IAT`, `CLERK_LEEWAY` -> Settings are defined in `backend/app/core/config.py` and typically configured via `backend/.env`. +### Frontend -| Variable | Required? | Purpose | Default / example | Notes | -|---|---:|---|---|---| -| `ENVIRONMENT` | no | Environment name (drives defaults) | `dev` | In `dev`, `DB_AUTO_MIGRATE` defaults to true **if not explicitly set** | -| `DATABASE_URL` | no | Postgres connection string | `postgresql+psycopg://...@localhost:5432/...` | In Compose, overridden to use `db:5432` | -| `CORS_ORIGINS` | no | Comma-separated CORS origins | empty | Compose supplies a sane default | -| `BASE_URL` | no | External base URL for this service | empty | Used for absolute links/callbacks if needed | -| `CLERK_SECRET_KEY` | **yes** | Clerk secret key (backend auth) | (none) | `backend/app/core/config.py` enforces non-empty | -| `CLERK_API_URL` | no | Clerk API base | `https://api.clerk.com` | — | -| `CLERK_VERIFY_IAT` | no | Verify issued-at claims | `true` | — | -| `CLERK_LEEWAY` | no | JWT timing leeway seconds | `10.0` | — | -| `LOG_LEVEL` | no | Logging level | `INFO` | — | -| `LOG_FORMAT` | no | Log format | `text` | — | -| `LOG_USE_UTC` | no | Use UTC timestamps | `false` | — | -| `DB_AUTO_MIGRATE` | no | Auto-migrate DB on startup | `false` in backend `.env.example` | In `dev`, backend may flip this to true if unset | +Template: `frontend/.env.example`. -### Frontend (Next.js) +- `NEXT_PUBLIC_API_URL` (required) -| Variable | Required? | Purpose | Default / example | Footguns | -|---|---:|---|---|---| -| `NEXT_PUBLIC_API_URL` | **yes** | Backend base URL used by the browser | `http://localhost:8000` | Must be browser-reachable | -| `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` | **yes** | Enables Clerk in the frontend | (none) | Must be a real publishable key | -| `CLERK_SECRET_KEY` | **yes** | Clerk secret key used by the frontend (server-side) and E2E | (none) | Do not commit; required for Clerk-enabled operation | -| `NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL` | optional | Post-login redirect | `/boards` | — | -| `NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL` | optional | Post-signup redirect | `/boards` | — | -| `NEXT_PUBLIC_CLERK_SIGN_IN_FALLBACK_REDIRECT_URL` | optional | Fallback redirect | `/boards` | — | -| `NEXT_PUBLIC_CLERK_SIGN_UP_FALLBACK_REDIRECT_URL` | optional | Fallback redirect | `/boards` | — | -| `NEXT_PUBLIC_CLERK_AFTER_SIGN_OUT_URL` | optional | Post-logout redirect | `/` | — | +Clerk: +- `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` +- `CLERK_SECRET_KEY` +- redirect URLs (`NEXT_PUBLIC_CLERK_*`) -## Operational footguns +## Minimal dev configuration -- **Clerk placeholder keys**: `frontend/.env.example` contains non-empty Clerk placeholders. `compose.yml` intentionally does **not** load it, because it can accidentally flip Clerk “on”. Prefer user-managed `frontend/.env` (for Compose) or `frontend/.env.local` (for Next dev). -- **`DB_AUTO_MIGRATE`**: - - In `ENVIRONMENT=dev`, backend defaults `DB_AUTO_MIGRATE=true` if you didn’t set it explicitly. - - In production, consider disabling auto-migrate and running migrations as an explicit step. -- **`NEXT_PUBLIC_API_URL` reachability**: must work from the browser’s network context (host), not only from within the Docker network. +### Split-mode dev (fastest contributor loop) + +- Start DB via Compose. +- Run backend+frontend dev servers. + +See [Development](03-development.md). + +## Migrations (`DB_AUTO_MIGRATE`) + +Evidence: `backend/app/db/session.py`. + +On backend startup: +- if `DB_AUTO_MIGRATE=true` and migrations exist under `backend/migrations/versions/`, backend runs `alembic upgrade head`. +- otherwise it falls back to `SQLModel.metadata.create_all`. + +Operational guidance: +- Auto-migrate is convenient on a single host. +- In multi-instance deployments, prefer running migrations as an explicit deploy step to avoid race conditions. + +## CORS (`CORS_ORIGINS`) + +Evidence: `backend/app/main.py`, `backend/app/core/config.py`. + +- `CORS_ORIGINS` is a comma-separated list. +- It must include the frontend origin (e.g. `http://localhost:3000`) or browser requests will fail. + +## Troubleshooting config issues + +- UI loads but API calls fail / Activity feed blank → `NEXT_PUBLIC_API_URL` is missing/incorrect. +- Backend fails at startup → check required env vars (notably `CLERK_SECRET_KEY`) and migrations. + +See also: `docs/troubleshooting/README.md`. diff --git a/docs/07-api-reference.md b/docs/07-api-reference.md index 346ffc2b..0c8ec4cb 100644 --- a/docs/07-api-reference.md +++ b/docs/07-api-reference.md @@ -1,290 +1,80 @@ -# API reference +# API / auth -## Deep dives +This page documents how Mission Control’s API surface is organized and how authentication works. +For deeper backend architecture context, see: - [Architecture](05-architecture.md) -- [Gateway protocol](openclaw_gateway_ws.md) -This page summarizes the **HTTP API surface** exposed by the FastAPI backend. -It is derived from `backend/app/main.py` (router registration) and `backend/app/api/*` (route modules). +## Base path -## Base -- API prefix: `/api/v1/*` (see `backend/app/main.py`) +Evidence: `backend/app/main.py`. -## Auth model (recap) -- **Clerk (user auth)**: used by the human web UI; frontend enables Clerk when `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` is set; backend verifies requests using `CLERK_SECRET_KEY` via the Clerk SDK (see `backend/app/core/auth.py`). -- **X-Agent-Token (agent auth)**: used by automation/agents; send header `X-Agent-Token: ` to `/api/v1/agent/*` endpoints (see `backend/app/core/agent_auth.py`). +- All API routes are mounted under: `/api/v1/*` + +## Auth model (two callers) + +Mission Control has two primary actor types: + +1) **User (Clerk)** — human UI/admin +2) **Agent (`X-Agent-Token`)** — automation + +### User auth (Clerk) + +Evidence: +- backend: `backend/app/core/auth.py` +- config: `backend/app/core/config.py` + +- Frontend calls backend using `Authorization: Bearer `. +- Backend validates requests using the Clerk Backend API SDK with `CLERK_SECRET_KEY`. + +### Agent auth (`X-Agent-Token`) + +Evidence: +- `backend/app/core/agent_auth.py` +- agent API surface: `backend/app/api/agent.py` + +- Agents authenticate with `X-Agent-Token: `. +- Token is verified against the agent’s stored `agent_token_hash`. ## Route groups (modules) +Evidence: `backend/app/main.py` includes routers from `backend/app/api/*`. + | Module | Prefix (under `/api/v1`) | Purpose | |---|---|---| | `activity.py` | `/activity` | Activity listing and task-comment feed endpoints. | | `agent.py` | `/agent` | Agent-scoped API routes for board operations and gateway coordination. | -| `agents.py` | `/agents` | Thin API wrappers for async agent lifecycle operations. | -| `approvals.py` | `/boards/{board_id}/approvals` | Approval listing, streaming, creation, and update endpoints. | -| `auth.py` | `/auth` | Authentication bootstrap endpoints for the Mission Control API. | -| `board_group_memory.py` | `` | Board-group memory CRUD and streaming endpoints. | -| `board_groups.py` | `/board-groups` | Board group CRUD, snapshot, and heartbeat endpoints. | -| `board_memory.py` | `/boards/{board_id}/memory` | Board memory CRUD and streaming endpoints. | -| `board_onboarding.py` | `/boards/{board_id}/onboarding` | Board onboarding endpoints for user/agent collaboration. | -| `boards.py` | `/boards` | Board CRUD and snapshot endpoints. | -| `gateway.py` | `/gateways` | Thin gateway session-inspection API wrappers. | -| `gateways.py` | `/gateways` | Thin API wrappers for gateway CRUD and template synchronization. | -| `metrics.py` | `/metrics` | Dashboard metric aggregation endpoints. | -| `organizations.py` | `/organizations` | Organization management endpoints and membership/invite flows. | -| `souls_directory.py` | `/souls-directory` | API routes for searching and fetching souls-directory markdown entries. | -| `tasks.py` | `/boards/{board_id}/tasks` | Task API routes for listing, streaming, and mutating board tasks. | -| `users.py` | `/users` | User self-service API endpoints for profile retrieval and updates. | +| `agents.py` | `/agents` | Agent lifecycle and streaming endpoints. | +| `approvals.py` | `/boards/{board_id}/approvals` | Approval list/create/update + streaming. | +| `auth.py` | `/auth` | Auth bootstrap endpoints. | +| `board_group_memory.py` | `/board-groups/{group_id}/memory` and `/boards/{board_id}/group-memory` | Board-group memory CRUD + streaming. | +| `board_groups.py` | `/board-groups` | Board group CRUD + snapshot + heartbeat apply. | +| `board_memory.py` | `/boards/{board_id}/memory` | Board memory CRUD + streaming. | +| `board_onboarding.py` | `/boards/{board_id}/onboarding` | Onboarding flows (user+agent). | +| `boards.py` | `/boards` | Board CRUD + snapshots. | +| `gateway.py` | `/gateways` | Gateway session inspection APIs (org admin). | +| `gateways.py` | `/gateways` | Gateway CRUD + templates sync (org admin). | +| `metrics.py` | `/metrics` | Dashboard metrics. | +| `organizations.py` | `/organizations` | Org + invites/membership flows. | +| `souls_directory.py` | `/souls-directory` | Search/fetch souls directory entries. | +| `tasks.py` | `/boards/{board_id}/tasks` | Task CRUD + comments + streaming. | +| `users.py` | `/users` | User self-service profile endpoints. | -## `/activity` — `activity.py` -*Activity listing and task-comment feed endpoints.* +## Where authorization is enforced -### router (prefix `/activity`) +Evidence: `backend/app/api/deps.py`. -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/activity` | `list_activity()` | List activity events visible to the calling actor. | -| `GET` | `/api/v1/activity/task-comments` | `list_task_comment_feed()` | List task-comment feed items for accessible boards. | -| `GET` | `/api/v1/activity/task-comments/stream` | `stream_task_comment_feed()` | Stream task-comment events for accessible boards. | +Most route modules don’t “hand roll” access checks; they declare dependencies: -## `/agent` — `agent.py` -*Agent-scoped API routes for board operations and gateway coordination.* +- `require_admin_auth` — admin user only. +- `require_admin_or_agent` — admin user OR authenticated agent. +- `get_board_for_actor_read` / `get_board_for_actor_write` — board access for user/agent. +- `require_org_member` / `require_org_admin` — org membership/admin for user callers. -### Agent automation API (`/api/v1/agent/*`) +## “Start here” pointers for maintainers -Auth: -- Header: `X-Agent-Token: ` -- See: `backend/app/core/agent_auth.py` and `backend/app/api/deps.py` - -High-signal endpoint index (from `backend/app/api/agent.py`): - -| Method | Path | Purpose | -|---|---|---| -| `POST` | `/api/v1/agent/heartbeat` | Agent check-in / heartbeat status | -| `GET` | `/api/v1/agent/boards` | List boards visible to the agent | -| `GET` | `/api/v1/agent/boards/{board_id}/tasks` | List tasks with filters (status, assignment, etc.) | -| `PATCH` | `/api/v1/agent/boards/{board_id}/tasks/{task_id}` | Update task fields (status/assignment/etc.) | -| `GET` | `/api/v1/agent/boards/{board_id}/tasks/{task_id}/comments` | List task comments | -| `POST` | `/api/v1/agent/boards/{board_id}/tasks/{task_id}/comments` | Create task comment (note: request body uses `message`) | -| `GET` | `/api/v1/agent/boards/{board_id}/memory` | List board memory entries | -| `POST` | `/api/v1/agent/boards/{board_id}/memory` | Create board memory entry | -| `POST` | `/api/v1/agent/boards/{board_id}/gateway/main/ask-user` | Route an “ask user” message through gateway-main | -| `POST` | `/api/v1/agent/gateway/leads/broadcast` | Broadcast a gateway-main message to multiple board leads | - -### router (prefix `/agent`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/agent/agents` | `list_agents()` | List agents, optionally filtered to a board. | -| `POST` | `/api/v1/agent/agents` | `create_agent()` | Create an agent on the caller's board. | -| `GET` | `/api/v1/agent/boards` | `list_boards()` | List boards visible to the authenticated agent. | -| `POST` | `/api/v1/agent/heartbeat` | `agent_heartbeat()` | Record heartbeat status for the authenticated agent. | -| `GET` | `/api/v1/agent/boards/{board_id}` | `get_board()` | Return a board if the authenticated agent can access it. | -| `GET` | `/api/v1/agent/boards/{board_id}/tasks` | `list_tasks()` | List tasks on a board with optional status and assignment filters. | -| `POST` | `/api/v1/agent/boards/{board_id}/tasks` | `create_task()` | Create a task on the board as the lead agent. | -| `POST` | `/api/v1/agent/gateway/leads/broadcast` | `broadcast_gateway_lead_message()` | Broadcast a gateway-main message to multiple board leads. | -| `GET` | `/api/v1/agent/boards/{board_id}/memory` | `list_board_memory()` | List board memory entries with optional chat filtering. | -| `POST` | `/api/v1/agent/boards/{board_id}/memory` | `create_board_memory()` | Create a board memory entry. | - -## `/agents` — `agents.py` -*Thin API wrappers for async agent lifecycle operations.* - -### router (prefix `/agents`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/agents` | `list_agents()` | List agents visible to the active organization admin. | -| `POST` | `/api/v1/agents` | `create_agent()` | Create and provision an agent. | -| `GET` | `/api/v1/agents/stream` | `stream_agents()` | Stream agent updates as SSE events. | -| `POST` | `/api/v1/agents/heartbeat` | `heartbeat_or_create_agent()` | Heartbeat an existing agent or create/provision one if needed. | -| `DELETE` | `/api/v1/agents/{agent_id}` | `delete_agent()` | Delete an agent and clean related task state. | -| `GET` | `/api/v1/agents/{agent_id}` | `get_agent()` | Get a single agent by id. | -| `PATCH` | `/api/v1/agents/{agent_id}` | `update_agent()` | Update agent metadata and optionally reprovision. | -| `POST` | `/api/v1/agents/{agent_id}/heartbeat` | `heartbeat_agent()` | Record a heartbeat for a specific agent. | - -## `/boards/{board_id}/approvals` — `approvals.py` -*Approval listing, streaming, creation, and update endpoints.* - -### router (prefix `/boards/{board_id}/approvals`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards/{board_id}/approvals` | `list_approvals()` | List approvals for a board, optionally filtering by status. | -| `POST` | `/api/v1/boards/{board_id}/approvals` | `create_approval()` | Create an approval for a board. | -| `GET` | `/api/v1/boards/{board_id}/approvals/stream` | `stream_approvals()` | Stream approval updates for a board using server-sent events. | -| `PATCH` | `/api/v1/boards/{board_id}/approvals/{approval_id}` | `update_approval()` | Update an approval's status and resolution timestamp. | - -## `/auth` — `auth.py` -*Authentication bootstrap endpoints for the Mission Control API.* - -### router (prefix `/auth`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `POST` | `/api/v1/auth/bootstrap` | `bootstrap_user()` | Return the authenticated user profile from token claims. | - -## `` — `board_group_memory.py` -*Board-group memory CRUD and streaming endpoints.* - -### board_router (prefix `/boards/{board_id}/group-memory`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards/{board_id}/group-memory` | `list_board_group_memory_for_board()` | List memory entries for the board's linked group. | -| `POST` | `/api/v1/boards/{board_id}/group-memory` | `create_board_group_memory_for_board()` | Create a group memory entry from a board context and notify recipients. | -| `GET` | `/api/v1/boards/{board_id}/group-memory/stream` | `stream_board_group_memory_for_board()` | Stream memory entries for the board's linked group. | - -### group_router (prefix `/board-groups/{group_id}/memory`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/board-groups/{group_id}/memory` | `list_board_group_memory()` | List board-group memory entries for a specific group. | -| `POST` | `/api/v1/board-groups/{group_id}/memory` | `create_board_group_memory()` | Create a board-group memory entry and notify chat recipients. | -| `GET` | `/api/v1/board-groups/{group_id}/memory/stream` | `stream_board_group_memory()` | Stream memory entries for a board group via server-sent events. | - -## `/board-groups` — `board_groups.py` -*Board group CRUD, snapshot, and heartbeat endpoints.* - -### router (prefix `/board-groups`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/board-groups` | `list_board_groups()` | List board groups in the active organization. | -| `POST` | `/api/v1/board-groups` | `create_board_group()` | Create a board group in the active organization. | -| `DELETE` | `/api/v1/board-groups/{group_id}` | `delete_board_group()` | Delete a board group. | -| `GET` | `/api/v1/board-groups/{group_id}` | `get_board_group()` | Get a board group by id. | -| `PATCH` | `/api/v1/board-groups/{group_id}` | `update_board_group()` | Update a board group. | -| `GET` | `/api/v1/board-groups/{group_id}/snapshot` | `get_board_group_snapshot()` | Get a snapshot across boards in a group. | -| `POST` | `/api/v1/board-groups/{group_id}/heartbeat` | `apply_board_group_heartbeat()` | Apply heartbeat settings to agents in a board group. | - -## `/boards/{board_id}/memory` — `board_memory.py` -*Board memory CRUD and streaming endpoints.* - -### router (prefix `/boards/{board_id}/memory`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards/{board_id}/memory` | `list_board_memory()` | List board memory entries, optionally filtering chat entries. | -| `POST` | `/api/v1/boards/{board_id}/memory` | `create_board_memory()` | Create a board memory entry and notify chat targets when needed. | -| `GET` | `/api/v1/boards/{board_id}/memory/stream` | `stream_board_memory()` | Stream board memory events over server-sent events. | - -## `/boards/{board_id}/onboarding` — `board_onboarding.py` -*Board onboarding endpoints for user/agent collaboration.* - -### router (prefix `/boards/{board_id}/onboarding`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards/{board_id}/onboarding` | `get_onboarding()` | Get the latest onboarding session for a board. | -| `POST` | `/api/v1/boards/{board_id}/onboarding/agent` | `agent_onboarding_update()` | Store onboarding updates submitted by the gateway agent. | -| `POST` | `/api/v1/boards/{board_id}/onboarding/start` | `start_onboarding()` | Start onboarding and send instructions to the gateway agent. | -| `POST` | `/api/v1/boards/{board_id}/onboarding/answer` | `answer_onboarding()` | Send a user onboarding answer to the gateway agent. | -| `POST` | `/api/v1/boards/{board_id}/onboarding/confirm` | `confirm_onboarding()` | Confirm onboarding results and provision the board lead agent. | - -## `/boards` — `boards.py` -*Board CRUD and snapshot endpoints.* - -### router (prefix `/boards`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards` | `list_boards()` | List boards visible to the current organization member. | -| `POST` | `/api/v1/boards` | `create_board()` | Create a board in the active organization. | -| `DELETE` | `/api/v1/boards/{board_id}` | `delete_board()` | Delete a board and all dependent records. | -| `GET` | `/api/v1/boards/{board_id}` | `get_board()` | Get a board by id. | -| `PATCH` | `/api/v1/boards/{board_id}` | `update_board()` | Update mutable board properties. | -| `GET` | `/api/v1/boards/{board_id}/snapshot` | `get_board_snapshot()` | Get a board snapshot view model. | -| `GET` | `/api/v1/boards/{board_id}/group-snapshot` | `get_board_group_snapshot()` | Get a grouped snapshot across related boards. | - -## `/gateways` — `gateway.py` -*Thin gateway session-inspection API wrappers.* - -### router (prefix `/gateways`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/gateways/status` | `gateways_status()` | Return gateway connectivity and session status. | -| `GET` | `/api/v1/gateways/commands` | `gateway_commands()` | Return supported gateway protocol methods and events. | -| `GET` | `/api/v1/gateways/sessions` | `list_gateway_sessions()` | List sessions for a gateway associated with a board. | -| `GET` | `/api/v1/gateways/sessions/{session_id}` | `get_gateway_session()` | Get a specific gateway session by key. | -| `GET` | `/api/v1/gateways/sessions/{session_id}/history` | `get_session_history()` | Fetch chat history for a gateway session. | -| `POST` | `/api/v1/gateways/sessions/{session_id}/message` | `send_gateway_session_message()` | Send a message into a specific gateway session. | - -## `/gateways` — `gateways.py` -*Thin API wrappers for gateway CRUD and template synchronization.* - -### router (prefix `/gateways`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/gateways` | `list_gateways()` | List gateways for the caller's organization. | -| `POST` | `/api/v1/gateways` | `create_gateway()` | Create a gateway and provision or refresh its main agent. | -| `DELETE` | `/api/v1/gateways/{gateway_id}` | `delete_gateway()` | Delete a gateway in the caller's organization. | -| `GET` | `/api/v1/gateways/{gateway_id}` | `get_gateway()` | Return one gateway by id for the caller's organization. | -| `PATCH` | `/api/v1/gateways/{gateway_id}` | `update_gateway()` | Patch a gateway and refresh the main-agent provisioning state. | -| `POST` | `/api/v1/gateways/{gateway_id}/templates/sync` | `sync_gateway_templates()` | Sync templates for a gateway and optionally rotate runtime settings. | - -## `/metrics` — `metrics.py` -*Dashboard metric aggregation endpoints.* - -### router (prefix `/metrics`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/metrics/dashboard` | `dashboard_metrics()` | Return dashboard KPIs and time-series data for accessible boards. | - -## `/organizations` — `organizations.py` -*Organization management endpoints and membership/invite flows.* - -### router (prefix `/organizations`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `POST` | `/api/v1/organizations` | `create_organization()` | Create an organization and assign the caller as owner. | -| `DELETE` | `/api/v1/organizations/me` | `delete_my_org()` | Delete the active organization and related entities. | -| `GET` | `/api/v1/organizations/me` | `get_my_org()` | Return the caller's active organization. | -| `GET` | `/api/v1/organizations/me/list` | `list_my_organizations()` | List organizations where the current user is a member. | -| `PATCH` | `/api/v1/organizations/me/active` | `set_active_org()` | Set the caller's active organization. | -| `GET` | `/api/v1/organizations/me/member` | `get_my_membership()` | Get the caller's membership record in the active organization. | -| `GET` | `/api/v1/organizations/me/invites` | `list_org_invites()` | List pending invites for the active organization. | -| `POST` | `/api/v1/organizations/me/invites` | `create_org_invite()` | Create an organization invite for an email address. | -| `GET` | `/api/v1/organizations/me/members` | `list_org_members()` | List members for the active organization. | -| `POST` | `/api/v1/organizations/invites/accept` | `accept_org_invite()` | Accept an invite and return resulting membership. | - -## `/souls-directory` — `souls_directory.py` -*API routes for searching and fetching souls-directory markdown entries.* - -### router (prefix `/souls-directory`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/souls-directory/search` | `search()` | Search souls-directory entries by handle/slug query text. | -| `GET` | `/api/v1/souls-directory/{handle}/{slug}` | `get_markdown()` | Fetch markdown content for a validated souls-directory handle and slug. | -| `GET` | `/api/v1/souls-directory/{handle}/{slug}.md` | `get_markdown()` | Fetch markdown content for a validated souls-directory handle and slug. | - -## `/boards/{board_id}/tasks` — `tasks.py` -*Task API routes for listing, streaming, and mutating board tasks.* - -### router (prefix `/boards/{board_id}/tasks`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `GET` | `/api/v1/boards/{board_id}/tasks` | `list_tasks()` | List board tasks with optional status and assignment filters. | -| `POST` | `/api/v1/boards/{board_id}/tasks` | `create_task()` | Create a task and initialize dependency rows. | -| `GET` | `/api/v1/boards/{board_id}/tasks/stream` | `stream_tasks()` | Stream task and task-comment events as SSE payloads. | -| `DELETE` | `/api/v1/boards/{board_id}/tasks/{task_id}` | `delete_task()` | Delete a task and related records. | -| `PATCH` | `/api/v1/boards/{board_id}/tasks/{task_id}` | `update_task()` | Update task status, assignment, comment, and dependency state. | -| `GET` | `/api/v1/boards/{board_id}/tasks/{task_id}/comments` | `list_task_comments()` | List comments for a task in chronological order. | -| `POST` | `/api/v1/boards/{board_id}/tasks/{task_id}/comments` | `create_task_comment()` | Create a task comment and notify relevant agents. | - -## `/users` — `users.py` -*User self-service API endpoints for profile retrieval and updates.* - -### router (prefix `/users`) - -| Method | Path | Handler | Notes | -|---|---|---|---| -| `DELETE` | `/api/v1/users/me` | `delete_me()` | Delete the authenticated account and any personal-only organizations. | -| `GET` | `/api/v1/users/me` | `get_me()` | Return the authenticated user's current profile payload. | -| `PATCH` | `/api/v1/users/me` | `update_me()` | Apply partial profile updates for the authenticated user. | +- Router wiring: `backend/app/main.py` +- Access dependencies: `backend/app/api/deps.py` +- User auth: `backend/app/core/auth.py` +- Agent auth: `backend/app/core/agent_auth.py` +- Agent automation surface: `backend/app/api/agent.py` diff --git a/docs/09-ops-runbooks.md b/docs/09-ops-runbooks.md index 1094e05f..1d6b11da 100644 --- a/docs/09-ops-runbooks.md +++ b/docs/09-ops-runbooks.md @@ -1,38 +1,81 @@ -# Ops / runbooks +# Operations -## Deep dives +This is the ops/SRE entrypoint. +It aims to answer, quickly: +- “Is the system up?” +- “What changed?” +- “What should I check next?” + +Deep dives: - [Deployment](deployment/README.md) - [Production](production/README.md) -- [Troubleshooting](troubleshooting/README.md) - -This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist. +- [Troubleshooting deep dive](troubleshooting/README.md) ## First 30 minutes (incident checklist) -1. **Confirm impact** - - What’s broken: UI, API, auth, or gateway integration? - - All users or a subset? +### 0) Stabilize communications -2. **Check service health** - - Backend: `/healthz` and `/readyz` - - Frontend: can it load? does it reach the API? +- Identify incident lead and comms channel. +- Capture last deploy SHA/tag and time window. +- Do not paste secrets into chat/tickets. -3. **Check auth (Clerk)** - - Frontend: did Clerk get enabled unintentionally? (publishable key set) - - Backend: is `CLERK_SECRET_KEY` configured correctly? +### 1) Confirm impact -4. **Check DB connectivity** - - Can backend connect to Postgres (`DATABASE_URL`)? +- UI broken vs API broken vs auth vs DB vs gateway integration. +- All users or subset? -5. **Check logs** - - Backend logs for 5xx spikes or auth failures. - - Frontend logs for API URL/proxy misconfig. +### 2) Health checks -6. **Stabilize** - - Roll back the last change if you can. - - Temporarily disable optional integrations (gateway) to isolate. +- Backend: + - `curl -f http://:8000/healthz` + - `curl -f http://:8000/readyz` +- Frontend: + - can the UI load? + - in browser devtools, are `/api/v1/*` requests failing? -## Backups / restore +### 3) Configuration sanity -See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup. +Common misconfigs that look like outages: + +- `NEXT_PUBLIC_API_URL` wrong → UI loads but API calls fail. +- `CORS_ORIGINS` missing frontend origin → browser CORS errors. +- Clerk misconfig → auth redirects/401s. + +### 4) Database + +- If backend is 5xx’ing broadly, DB is a top suspect. +- Verify `DATABASE_URL` points at the correct host. + +### 5) Logs + +Compose: + +```bash +docker compose -f compose.yml --env-file .env logs -f --tail=200 +``` + +Targeted: + +```bash +docker compose -f compose.yml --env-file .env logs -f --tail=200 backend +``` + +### 6) Rollback / isolate + +- If there was a recent deploy and symptoms align, rollback to last known good. +- If gateway integration is implicated, isolate by disabling gateway-dependent flows. + +## Common failure modes + +- UI loads, Activity feed blank → `NEXT_PUBLIC_API_URL` wrong/unreachable. +- Repeated auth redirects/errors → Clerk keys/redirects misconfigured. +- Backend 5xx → DB outage/misconfig; migration failure. +- Backend won’t start → config validation failure (e.g. empty `CLERK_SECRET_KEY`). + +## Backups + +Evidence: `docs/production/README.md`. + +- Minimum viable: periodic `pg_dump` to off-host storage. +- Treat restore as a drill (quarterly), not a one-time checklist. diff --git a/docs/10-troubleshooting.md b/docs/10-troubleshooting.md index 6e837bdf..ee3b1096 100644 --- a/docs/10-troubleshooting.md +++ b/docs/10-troubleshooting.md @@ -1,24 +1,38 @@ # Troubleshooting -## Deep dives +This is the “symptom → checks → likely fixes” page. +For deeper playbooks, see: - [Troubleshooting deep dive](troubleshooting/README.md) -This is the “quick triage” page. For detailed playbooks and diagnostics, use the deep dive. +## Triage map -## Quick triage +| Symptom | Fast checks | Likely fix | +|---|---|---| +| UI loads but API calls fail / Activity feed blank | Browser devtools shows `/api/v1/*` failures; check backend `/healthz` | Fix `NEXT_PUBLIC_API_URL` (must be browser-reachable) | +| UI redirects / Clerk errors | Is `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` set? Are Clerk redirects correct? | Unset keys for local dev without Clerk; configure real keys for prod | +| Backend `/healthz` fails | Is backend container/process running? check backend logs | Fix crash loop: env vars, DB connectivity, migrations | +| Backend returns 5xx | Check DB connectivity (`DATABASE_URL`), DB logs | Fix DB outage/misconfig; re-run migrations if needed | +| Browser shows CORS errors | Compare `CORS_ORIGINS` vs frontend origin | Add frontend origin to `CORS_ORIGINS` | -### Frontend loads but shows API errors -- Confirm `NEXT_PUBLIC_API_URL` points to a backend your browser can reach. -- Check backend `/healthz`. +## Common checks -### Frontend keeps redirecting / Clerk errors -- Verify your Clerk keys are set correctly in the frontend environment. -- See: [Deployment guide](deployment/README.md) (Clerk auth notes). +### 1) Verify backend health -### Backend returns 5xx -- Check DB connectivity (`DATABASE_URL`) and migrations. -- Check backend logs. +```bash +curl -f http://localhost:8000/healthz +``` + +### 2) Verify frontend can reach backend + +- Ensure `NEXT_PUBLIC_API_URL` matches the backend URL the browser can reach. + +### 3) Check logs + +```bash +docker compose -f compose.yml --env-file .env logs -f --tail=200 backend +``` ## Next -- Promote the most common issues from [Troubleshooting deep dive](troubleshooting/README.md) into this page once we see repeated incidents. + +If you hit a recurring incident, promote it from the deep-dive page into this triage map.