docs: add missing context to overview/architecture/ops/troubleshooting

This commit is contained in:
Abhimanyu Saharan
2026-02-11 09:45:29 +00:00
parent 04c6822ea8
commit 7be38140a1
4 changed files with 106 additions and 46 deletions

View File

@@ -1,23 +1,45 @@
# Overview
Mission Control is the **web UI + HTTP API** for operating OpenClaw. Its where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
Mission Control is the **web UI + HTTP API** for operating OpenClaw.
Its where you manage **boards**, **tasks**, **agents**, **approvals**, and (optionally) **gateway connections**.
## Problem statement
- Provide a single place to coordinate work (boards/tasks) and execute automation (agents) safely.
## Non-goals (first pass)
- Not a general-purpose project management suite.
- Not a full observability platform.
OpenClaw can execute work (tools/skills) and converse across channels, but real operations need a place to:
- **Coordinate** work across people + agents (whats next, whats blocked, who owns what)
- **Track evidence** of what happened (commands run, links, logs, artifacts)
- **Control risk** (approvals, guardrails, isolation)
- **Operate reliably** (deployment, configuration, troubleshooting)
Mission Control provides that control plane.
## Who uses it
- **Maintainers / operators**: keep Mission Control + gateways healthy, deploy upgrades, respond to incidents.
- **Contributors**: develop backend/frontend changes, run tests, ship docs.
- **Automation authors**: define agent identities, skills, and task workflows.
## Key concepts (glossary-lite)
- **Board**: a workspace containing tasks, memory, and agents.
- **Task**: unit of work on a board; has status and comments.
- **Task**: a unit of work on a board (status + comments/evidence).
- **Agent**: an automated worker that can execute tasks and post evidence.
- **Gateway**: OpenClaw runtime host that executes tools/skills and runs heartbeats/cron.
- **Heartbeat**: periodic agent check-in loop for doing incremental work.
- **Cron job**: scheduled execution (recurring or one-shot) isolated from conversational context.
- **Approval**: a structured “allow/deny” checkpoint for risky actions.
- **Gateway**: the OpenClaw runtime host that executes tools/skills and runs heartbeats/cron.
- **Heartbeat**: periodic agent check-in loop for incremental work.
- **Cron job**: scheduled execution (recurring or one-shot), often isolated from conversational context.
## Non-goals
- Not a general-purpose project management suite (we optimize for AI-assisted operations, not every PM feature).
- Not a full observability platform (we integrate with logs/metrics rather than replacing them).
- Not a secrets manager (we reference secret sources; dont store secrets in docs/tasks/comments).
## Where to go next
- Want it running? → [Quickstart](02-quickstart.md)
- Want to contribute? → [Development](03-development.md)
- Want to understand internals? → [Architecture](05-architecture.md)
- Operating it? → [Ops / runbooks](09-ops-runbooks.md)

View File

@@ -7,18 +7,21 @@
Mission Control is the **web UI + HTTP API** for operating OpenClaw. Its where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
> Auth note: **Clerk is required for now** (current product direction). The codebase includes gating so CI/local can run with placeholders, but real deployments should configure Clerk.
> Auth note: **Clerk is required for production**. The codebase includes gating so CI/local can run without “real” keys, but real deployments should configure Clerk.
## Components
- **Frontend**: Next.js app used by humans
- Location: `frontend/`
- Routes/pages: `frontend/src/app/*` (Next.js App Router)
- API client: generated + custom fetch (see `frontend/src/api/*`, `frontend/src/lib/api-base.ts`)
- **Backend**: FastAPI service exposing REST endpoints
- Location: `backend/`
- App wiring: `backend/app/main.py`
- Entrypoint: `backend/app/main.py`
- API prefix: `/api/v1/*`
- **Database**: Postgres (from `compose.yml`)
- **Database**: Postgres (see `compose.yml`)
- **Gateway integration (optional)**: backend may call into OpenClaw Gateways over WebSockets
- Client/protocol list: `backend/app/services/openclaw/gateway_rpc.py`
## Diagram (conceptual)
@@ -33,24 +36,58 @@ flowchart LR
GW --> OC[OpenClaw runtime]
```
## Key request/data flows
## How requests flow
### UI → API
1. Browser loads the Next.js frontend.
2. Frontend calls backend endpoints under `/api/v1/*`.
3. Backend reads/writes Postgres.
### 1) A human uses the UI
### Auth (Clerk)
- Frontend enables Clerk when a publishable key is present/valid.
- Backend verifies Clerk JWTs using **`CLERK_JWKS_URL`**.
1. Browser loads the Next.js frontend (`frontend/`).
2. Frontend calls backend endpoints using `NEXT_PUBLIC_API_URL`.
3. Backend routes under `/api/v1/*` (`backend/app/main.py`) and reads/writes Postgres.
See also:
- Frontend auth gating: `frontend/src/auth/*` (notably `frontend/src/auth/clerkKey.ts`).
- Backend auth: `backend/app/core/auth.py`.
Common UI-driven data shapes:
- “boards/tasks” views → board/task CRUD + streams.
- activity feed” → activity/events endpoints.
### Agent access (X-Agent-Token)
Automation/agents can use the “agent API surface”:
- Endpoints under `/api/v1/agent/*`.
- Auth via `X-Agent-Token`.
### 2) Authentication (Clerk)
See: `backend/app/api/agent.py`, `backend/app/core/agent_auth.py`.
- **Frontend**: Clerk is enabled only when a publishable key is present/valid.
- Gating/wrappers: `frontend/src/auth/clerkKey.ts`, `frontend/src/auth/clerk.tsx`.
- **Frontend → backend**: API calls attach `Authorization: Bearer <token>` when available.
- Token injection: `frontend/src/api/mutator.ts` (uses `window.Clerk.session.getToken()`).
- **Backend**: validates inbound auth and resolves a user context.
- Implementation: `backend/app/core/auth.py` (uses `clerk_backend_api` SDK with `CLERK_SECRET_KEY`).
### 3) Agent automation surface (`/api/v1/agent/*`)
Agents can call a dedicated API surface:
- Router: `backend/app/api/agent.py` (prefix `/agent` → mounted under `/api/v1/agent/*`).
- Authentication: `X-Agent-Token` header (or agent-only Authorization bearer parsing).
- Implementation: `backend/app/core/agent_auth.py`.
Typical agent flows:
- Heartbeat/presence updates
- Task comment posting (evidence)
- Board memory updates
- Lead coordination actions (if board-lead agent)
### 4) Streaming/feeds (server-sent events)
Some endpoints support streaming via SSE (`text/event-stream`).
Notes:
- Uses `sse-starlette` in backend routes (e.g. task/activity/memory routers).
### 5) Gateway integration (optional)
Mission Control can coordinate with OpenClaw Gateways over WebSockets.
- Protocol methods/events list: `backend/app/services/openclaw/gateway_rpc.py`.
- Operator-facing protocol docs: `docs/openclaw_gateway_ws.md`.
## Where to start reading code
- Backend entrypoint + router wiring: `backend/app/main.py`
- Auth dependencies + access enforcement: `backend/app/api/deps.py`
- User auth: `backend/app/core/auth.py`
- Agent auth: `backend/app/core/agent_auth.py`
- Agent API surface: `backend/app/api/agent.py`

View File

@@ -6,32 +6,33 @@
- [Production](production/README.md)
- [Troubleshooting](troubleshooting/README.md)
This page is the operator/SRE entry point. It intentionally links to existing deeper docs to minimize churn.
This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
## First 30 minutes incident checklist
## First 30 minutes (incident checklist)
1. **Confirm user impact + scope**
- What is broken: UI, API, auth, or gateway integration?
- Is it all users or a subset?
1. **Confirm impact**
- Whats broken: UI, API, auth, or gateway integration?
- All users or a subset?
2. **Check service health**
- Backend: `/healthz` and `/readyz`
- Frontend: can it load? does it reach the API?
3. **Check auth (Clerk) configuration**
- Frontend: is Clerk enabled unexpectedly? (publishable key set)
- Backend: is `CLERK_JWKS_URL` configured correctly?
3. **Check auth (Clerk)**
- Frontend: did Clerk get enabled unintentionally? (publishable key set)
- Backend: is `CLERK_SECRET_KEY` configured correctly?
4. **Check DB connectivity**
- Can backend connect to Postgres (`DATABASE_URL`)?
5. **Check logs**
- Backend logs for 5xx spikes or auth failures.
- Frontend logs for proxy/API URL misconfig.
- Frontend logs for API URL/proxy misconfig.
6. **Stabilize**
- Roll back the last change if available.
- Roll back the last change if you can.
- Temporarily disable optional integrations (gateway) to isolate.
## Backups / restore (placeholder)
- Define backup cadence and restore steps once production deployment is finalized.
## Backups / restore
See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.

View File

@@ -4,19 +4,19 @@
- [Troubleshooting deep dive](troubleshooting/README.md)
This is the high-level troubleshooting entry point (minimal churn).
This is the “quick triage” page. For detailed playbooks and diagnostics, use the deep dive.
## Quick triage
### Symptom: frontend loads but shows API errors
- Confirm `NEXT_PUBLIC_API_URL` points to a reachable backend.
### Frontend loads but shows API errors
- Confirm `NEXT_PUBLIC_API_URL` points to a backend your browser can reach.
- Check backend `/healthz`.
### Symptom: frontend keeps redirecting / Clerk errors
- If you are running locally without Clerk, ensure `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` is **unset/blank**.
### Frontend keeps redirecting / Clerk errors
- If youre running locally without Clerk, keep `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` unset/blank.
- See: [repo README Clerk note](../README.md#note-on-auth-clerk).
### Symptom: backend 5xx
### Backend returns 5xx
- Check DB connectivity (`DATABASE_URL`) and migrations.
- Check backend logs.