docs: add missing context to overview/architecture/ops/troubleshooting

This commit is contained in:
Abhimanyu Saharan
2026-02-11 09:45:29 +00:00
parent 04c6822ea8
commit 7be38140a1
4 changed files with 106 additions and 46 deletions

View File

@@ -1,23 +1,45 @@
# Overview # Overview
Mission Control is the **web UI + HTTP API** for operating OpenClaw. Its where you manage boards, tasks, agents, approvals, and (optionally) gateway connections. Mission Control is the **web UI + HTTP API** for operating OpenClaw.
Its where you manage **boards**, **tasks**, **agents**, **approvals**, and (optionally) **gateway connections**.
## Problem statement ## Problem statement
- Provide a single place to coordinate work (boards/tasks) and execute automation (agents) safely.
## Non-goals (first pass) OpenClaw can execute work (tools/skills) and converse across channels, but real operations need a place to:
- Not a general-purpose project management suite.
- Not a full observability platform. - **Coordinate** work across people + agents (whats next, whats blocked, who owns what)
- **Track evidence** of what happened (commands run, links, logs, artifacts)
- **Control risk** (approvals, guardrails, isolation)
- **Operate reliably** (deployment, configuration, troubleshooting)
Mission Control provides that control plane.
## Who uses it
- **Maintainers / operators**: keep Mission Control + gateways healthy, deploy upgrades, respond to incidents.
- **Contributors**: develop backend/frontend changes, run tests, ship docs.
- **Automation authors**: define agent identities, skills, and task workflows.
## Key concepts (glossary-lite) ## Key concepts (glossary-lite)
- **Board**: a workspace containing tasks, memory, and agents. - **Board**: a workspace containing tasks, memory, and agents.
- **Task**: unit of work on a board; has status and comments. - **Task**: a unit of work on a board (status + comments/evidence).
- **Agent**: an automated worker that can execute tasks and post evidence. - **Agent**: an automated worker that can execute tasks and post evidence.
- **Gateway**: OpenClaw runtime host that executes tools/skills and runs heartbeats/cron. - **Approval**: a structured “allow/deny” checkpoint for risky actions.
- **Heartbeat**: periodic agent check-in loop for doing incremental work. - **Gateway**: the OpenClaw runtime host that executes tools/skills and runs heartbeats/cron.
- **Cron job**: scheduled execution (recurring or one-shot) isolated from conversational context. - **Heartbeat**: periodic agent check-in loop for incremental work.
- **Cron job**: scheduled execution (recurring or one-shot), often isolated from conversational context.
## Non-goals
- Not a general-purpose project management suite (we optimize for AI-assisted operations, not every PM feature).
- Not a full observability platform (we integrate with logs/metrics rather than replacing them).
- Not a secrets manager (we reference secret sources; dont store secrets in docs/tasks/comments).
## Where to go next ## Where to go next
- Want it running? → [Quickstart](02-quickstart.md) - Want it running? → [Quickstart](02-quickstart.md)
- Want to contribute? → [Development](03-development.md) - Want to contribute? → [Development](03-development.md)
- Want to understand internals? → [Architecture](05-architecture.md) - Want to understand internals? → [Architecture](05-architecture.md)
- Operating it? → [Ops / runbooks](09-ops-runbooks.md)

View File

@@ -7,18 +7,21 @@
Mission Control is the **web UI + HTTP API** for operating OpenClaw. Its where you manage boards, tasks, agents, approvals, and (optionally) gateway connections. Mission Control is the **web UI + HTTP API** for operating OpenClaw. Its where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
> Auth note: **Clerk is required for now** (current product direction). The codebase includes gating so CI/local can run with placeholders, but real deployments should configure Clerk. > Auth note: **Clerk is required for production**. The codebase includes gating so CI/local can run without “real” keys, but real deployments should configure Clerk.
## Components ## Components
- **Frontend**: Next.js app used by humans - **Frontend**: Next.js app used by humans
- Location: `frontend/` - Location: `frontend/`
- Routes/pages: `frontend/src/app/*` (Next.js App Router) - Routes/pages: `frontend/src/app/*` (Next.js App Router)
- API client: generated + custom fetch (see `frontend/src/api/*`, `frontend/src/lib/api-base.ts`)
- **Backend**: FastAPI service exposing REST endpoints - **Backend**: FastAPI service exposing REST endpoints
- Location: `backend/` - Location: `backend/`
- App wiring: `backend/app/main.py` - Entrypoint: `backend/app/main.py`
- API prefix: `/api/v1/*` - API prefix: `/api/v1/*`
- **Database**: Postgres (from `compose.yml`) - **Database**: Postgres (see `compose.yml`)
- **Gateway integration (optional)**: backend may call into OpenClaw Gateways over WebSockets
- Client/protocol list: `backend/app/services/openclaw/gateway_rpc.py`
## Diagram (conceptual) ## Diagram (conceptual)
@@ -33,24 +36,58 @@ flowchart LR
GW --> OC[OpenClaw runtime] GW --> OC[OpenClaw runtime]
``` ```
## Key request/data flows ## How requests flow
### UI → API ### 1) A human uses the UI
1. Browser loads the Next.js frontend.
2. Frontend calls backend endpoints under `/api/v1/*`.
3. Backend reads/writes Postgres.
### Auth (Clerk) 1. Browser loads the Next.js frontend (`frontend/`).
- Frontend enables Clerk when a publishable key is present/valid. 2. Frontend calls backend endpoints using `NEXT_PUBLIC_API_URL`.
- Backend verifies Clerk JWTs using **`CLERK_JWKS_URL`**. 3. Backend routes under `/api/v1/*` (`backend/app/main.py`) and reads/writes Postgres.
See also: Common UI-driven data shapes:
- Frontend auth gating: `frontend/src/auth/*` (notably `frontend/src/auth/clerkKey.ts`). - “boards/tasks” views → board/task CRUD + streams.
- Backend auth: `backend/app/core/auth.py`. - activity feed” → activity/events endpoints.
### Agent access (X-Agent-Token) ### 2) Authentication (Clerk)
Automation/agents can use the “agent API surface”:
- Endpoints under `/api/v1/agent/*`.
- Auth via `X-Agent-Token`.
See: `backend/app/api/agent.py`, `backend/app/core/agent_auth.py`. - **Frontend**: Clerk is enabled only when a publishable key is present/valid.
- Gating/wrappers: `frontend/src/auth/clerkKey.ts`, `frontend/src/auth/clerk.tsx`.
- **Frontend → backend**: API calls attach `Authorization: Bearer <token>` when available.
- Token injection: `frontend/src/api/mutator.ts` (uses `window.Clerk.session.getToken()`).
- **Backend**: validates inbound auth and resolves a user context.
- Implementation: `backend/app/core/auth.py` (uses `clerk_backend_api` SDK with `CLERK_SECRET_KEY`).
### 3) Agent automation surface (`/api/v1/agent/*`)
Agents can call a dedicated API surface:
- Router: `backend/app/api/agent.py` (prefix `/agent` → mounted under `/api/v1/agent/*`).
- Authentication: `X-Agent-Token` header (or agent-only Authorization bearer parsing).
- Implementation: `backend/app/core/agent_auth.py`.
Typical agent flows:
- Heartbeat/presence updates
- Task comment posting (evidence)
- Board memory updates
- Lead coordination actions (if board-lead agent)
### 4) Streaming/feeds (server-sent events)
Some endpoints support streaming via SSE (`text/event-stream`).
Notes:
- Uses `sse-starlette` in backend routes (e.g. task/activity/memory routers).
### 5) Gateway integration (optional)
Mission Control can coordinate with OpenClaw Gateways over WebSockets.
- Protocol methods/events list: `backend/app/services/openclaw/gateway_rpc.py`.
- Operator-facing protocol docs: `docs/openclaw_gateway_ws.md`.
## Where to start reading code
- Backend entrypoint + router wiring: `backend/app/main.py`
- Auth dependencies + access enforcement: `backend/app/api/deps.py`
- User auth: `backend/app/core/auth.py`
- Agent auth: `backend/app/core/agent_auth.py`
- Agent API surface: `backend/app/api/agent.py`

View File

@@ -6,32 +6,33 @@
- [Production](production/README.md) - [Production](production/README.md)
- [Troubleshooting](troubleshooting/README.md) - [Troubleshooting](troubleshooting/README.md)
This page is the operator/SRE entry point. It intentionally links to existing deeper docs to minimize churn. This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
## First 30 minutes incident checklist ## First 30 minutes (incident checklist)
1. **Confirm user impact + scope** 1. **Confirm impact**
- What is broken: UI, API, auth, or gateway integration? - Whats broken: UI, API, auth, or gateway integration?
- Is it all users or a subset? - All users or a subset?
2. **Check service health** 2. **Check service health**
- Backend: `/healthz` and `/readyz` - Backend: `/healthz` and `/readyz`
- Frontend: can it load? does it reach the API? - Frontend: can it load? does it reach the API?
3. **Check auth (Clerk) configuration** 3. **Check auth (Clerk)**
- Frontend: is Clerk enabled unexpectedly? (publishable key set) - Frontend: did Clerk get enabled unintentionally? (publishable key set)
- Backend: is `CLERK_JWKS_URL` configured correctly? - Backend: is `CLERK_SECRET_KEY` configured correctly?
4. **Check DB connectivity** 4. **Check DB connectivity**
- Can backend connect to Postgres (`DATABASE_URL`)? - Can backend connect to Postgres (`DATABASE_URL`)?
5. **Check logs** 5. **Check logs**
- Backend logs for 5xx spikes or auth failures. - Backend logs for 5xx spikes or auth failures.
- Frontend logs for proxy/API URL misconfig. - Frontend logs for API URL/proxy misconfig.
6. **Stabilize** 6. **Stabilize**
- Roll back the last change if available. - Roll back the last change if you can.
- Temporarily disable optional integrations (gateway) to isolate. - Temporarily disable optional integrations (gateway) to isolate.
## Backups / restore (placeholder) ## Backups / restore
- Define backup cadence and restore steps once production deployment is finalized.
See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.

View File

@@ -4,19 +4,19 @@
- [Troubleshooting deep dive](troubleshooting/README.md) - [Troubleshooting deep dive](troubleshooting/README.md)
This is the high-level troubleshooting entry point (minimal churn). This is the “quick triage” page. For detailed playbooks and diagnostics, use the deep dive.
## Quick triage ## Quick triage
### Symptom: frontend loads but shows API errors ### Frontend loads but shows API errors
- Confirm `NEXT_PUBLIC_API_URL` points to a reachable backend. - Confirm `NEXT_PUBLIC_API_URL` points to a backend your browser can reach.
- Check backend `/healthz`. - Check backend `/healthz`.
### Symptom: frontend keeps redirecting / Clerk errors ### Frontend keeps redirecting / Clerk errors
- If you are running locally without Clerk, ensure `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` is **unset/blank**. - If youre running locally without Clerk, keep `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` unset/blank.
- See: [repo README Clerk note](../README.md#note-on-auth-clerk). - See: [repo README Clerk note](../README.md#note-on-auth-clerk).
### Symptom: backend 5xx ### Backend returns 5xx
- Check DB connectivity (`DATABASE_URL`) and migrations. - Check DB connectivity (`DATABASE_URL`) and migrations.
- Check backend logs. - Check backend logs.