From 7be38140a1309b76f1026bfa16c06a008042cfb6 Mon Sep 17 00:00:00 2001 From: Abhimanyu Saharan Date: Wed, 11 Feb 2026 09:45:29 +0000 Subject: [PATCH] docs: add missing context to overview/architecture/ops/troubleshooting --- docs/01-overview.md | 40 +++++++++++++++----- docs/05-architecture.md | 75 ++++++++++++++++++++++++++++---------- docs/09-ops-runbooks.md | 25 +++++++------ docs/10-troubleshooting.md | 12 +++--- 4 files changed, 106 insertions(+), 46 deletions(-) diff --git a/docs/01-overview.md b/docs/01-overview.md index 5a78add6..94a85477 100644 --- a/docs/01-overview.md +++ b/docs/01-overview.md @@ -1,23 +1,45 @@ # Overview -Mission Control is the **web UI + HTTP API** for operating OpenClaw. It’s where you manage boards, tasks, agents, approvals, and (optionally) gateway connections. +Mission Control is the **web UI + HTTP API** for operating OpenClaw. + +It’s where you manage **boards**, **tasks**, **agents**, **approvals**, and (optionally) **gateway connections**. ## Problem statement -- Provide a single place to coordinate work (boards/tasks) and execute automation (agents) safely. -## Non-goals (first pass) -- Not a general-purpose project management suite. -- Not a full observability platform. +OpenClaw can execute work (tools/skills) and converse across channels, but real operations need a place to: + +- **Coordinate** work across people + agents (what’s next, what’s blocked, who owns what) +- **Track evidence** of what happened (commands run, links, logs, artifacts) +- **Control risk** (approvals, guardrails, isolation) +- **Operate reliably** (deployment, configuration, troubleshooting) + +Mission Control provides that control plane. + +## Who uses it + +- **Maintainers / operators**: keep Mission Control + gateways healthy, deploy upgrades, respond to incidents. +- **Contributors**: develop backend/frontend changes, run tests, ship docs. +- **Automation authors**: define agent identities, skills, and task workflows. ## Key concepts (glossary-lite) + - **Board**: a workspace containing tasks, memory, and agents. -- **Task**: unit of work on a board; has status and comments. +- **Task**: a unit of work on a board (status + comments/evidence). - **Agent**: an automated worker that can execute tasks and post evidence. -- **Gateway**: OpenClaw runtime host that executes tools/skills and runs heartbeats/cron. -- **Heartbeat**: periodic agent check-in loop for doing incremental work. -- **Cron job**: scheduled execution (recurring or one-shot) isolated from conversational context. +- **Approval**: a structured “allow/deny” checkpoint for risky actions. +- **Gateway**: the OpenClaw runtime host that executes tools/skills and runs heartbeats/cron. +- **Heartbeat**: periodic agent check-in loop for incremental work. +- **Cron job**: scheduled execution (recurring or one-shot), often isolated from conversational context. + +## Non-goals + +- Not a general-purpose project management suite (we optimize for AI-assisted operations, not every PM feature). +- Not a full observability platform (we integrate with logs/metrics rather than replacing them). +- Not a secrets manager (we reference secret sources; don’t store secrets in docs/tasks/comments). ## Where to go next + - Want it running? → [Quickstart](02-quickstart.md) - Want to contribute? → [Development](03-development.md) - Want to understand internals? → [Architecture](05-architecture.md) +- Operating it? → [Ops / runbooks](09-ops-runbooks.md) diff --git a/docs/05-architecture.md b/docs/05-architecture.md index 99f2af80..2d035dd7 100644 --- a/docs/05-architecture.md +++ b/docs/05-architecture.md @@ -7,18 +7,21 @@ Mission Control is the **web UI + HTTP API** for operating OpenClaw. It’s where you manage boards, tasks, agents, approvals, and (optionally) gateway connections. -> Auth note: **Clerk is required for now** (current product direction). The codebase includes gating so CI/local can run with placeholders, but real deployments should configure Clerk. +> Auth note: **Clerk is required for production**. The codebase includes gating so CI/local can run without “real” keys, but real deployments should configure Clerk. ## Components - **Frontend**: Next.js app used by humans - Location: `frontend/` - Routes/pages: `frontend/src/app/*` (Next.js App Router) + - API client: generated + custom fetch (see `frontend/src/api/*`, `frontend/src/lib/api-base.ts`) - **Backend**: FastAPI service exposing REST endpoints - Location: `backend/` - - App wiring: `backend/app/main.py` + - Entrypoint: `backend/app/main.py` - API prefix: `/api/v1/*` -- **Database**: Postgres (from `compose.yml`) +- **Database**: Postgres (see `compose.yml`) +- **Gateway integration (optional)**: backend may call into OpenClaw Gateways over WebSockets + - Client/protocol list: `backend/app/services/openclaw/gateway_rpc.py` ## Diagram (conceptual) @@ -33,24 +36,58 @@ flowchart LR GW --> OC[OpenClaw runtime] ``` -## Key request/data flows +## How requests flow -### UI → API -1. Browser loads the Next.js frontend. -2. Frontend calls backend endpoints under `/api/v1/*`. -3. Backend reads/writes Postgres. +### 1) A human uses the UI -### Auth (Clerk) -- Frontend enables Clerk when a publishable key is present/valid. -- Backend verifies Clerk JWTs using **`CLERK_JWKS_URL`**. +1. Browser loads the Next.js frontend (`frontend/`). +2. Frontend calls backend endpoints using `NEXT_PUBLIC_API_URL`. +3. Backend routes under `/api/v1/*` (`backend/app/main.py`) and reads/writes Postgres. -See also: -- Frontend auth gating: `frontend/src/auth/*` (notably `frontend/src/auth/clerkKey.ts`). -- Backend auth: `backend/app/core/auth.py`. +Common UI-driven data shapes: +- “boards/tasks” views → board/task CRUD + streams. +- “activity feed” → activity/events endpoints. -### Agent access (X-Agent-Token) -Automation/agents can use the “agent API surface”: -- Endpoints under `/api/v1/agent/*`. -- Auth via `X-Agent-Token`. +### 2) Authentication (Clerk) -See: `backend/app/api/agent.py`, `backend/app/core/agent_auth.py`. +- **Frontend**: Clerk is enabled only when a publishable key is present/valid. + - Gating/wrappers: `frontend/src/auth/clerkKey.ts`, `frontend/src/auth/clerk.tsx`. +- **Frontend → backend**: API calls attach `Authorization: Bearer ` when available. + - Token injection: `frontend/src/api/mutator.ts` (uses `window.Clerk.session.getToken()`). +- **Backend**: validates inbound auth and resolves a user context. + - Implementation: `backend/app/core/auth.py` (uses `clerk_backend_api` SDK with `CLERK_SECRET_KEY`). + +### 3) Agent automation surface (`/api/v1/agent/*`) + +Agents can call a dedicated API surface: + +- Router: `backend/app/api/agent.py` (prefix `/agent` → mounted under `/api/v1/agent/*`). +- Authentication: `X-Agent-Token` header (or agent-only Authorization bearer parsing). + - Implementation: `backend/app/core/agent_auth.py`. + +Typical agent flows: +- Heartbeat/presence updates +- Task comment posting (evidence) +- Board memory updates +- Lead coordination actions (if board-lead agent) + +### 4) Streaming/feeds (server-sent events) + +Some endpoints support streaming via SSE (`text/event-stream`). +Notes: +- Uses `sse-starlette` in backend routes (e.g. task/activity/memory routers). + +### 5) Gateway integration (optional) + +Mission Control can coordinate with OpenClaw Gateways over WebSockets. + +- Protocol methods/events list: `backend/app/services/openclaw/gateway_rpc.py`. +- Operator-facing protocol docs: `docs/openclaw_gateway_ws.md`. + +## Where to start reading code + +- Backend entrypoint + router wiring: `backend/app/main.py` +- Auth dependencies + access enforcement: `backend/app/api/deps.py` +- User auth: `backend/app/core/auth.py` +- Agent auth: `backend/app/core/agent_auth.py` +- Agent API surface: `backend/app/api/agent.py` diff --git a/docs/09-ops-runbooks.md b/docs/09-ops-runbooks.md index 6e29fddc..1094e05f 100644 --- a/docs/09-ops-runbooks.md +++ b/docs/09-ops-runbooks.md @@ -6,32 +6,33 @@ - [Production](production/README.md) - [Troubleshooting](troubleshooting/README.md) -This page is the operator/SRE entry point. It intentionally links to existing deeper docs to minimize churn. +This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist. -## “First 30 minutes” incident checklist +## First 30 minutes (incident checklist) -1. **Confirm user impact + scope** - - What is broken: UI, API, auth, or gateway integration? - - Is it all users or a subset? +1. **Confirm impact** + - What’s broken: UI, API, auth, or gateway integration? + - All users or a subset? 2. **Check service health** - Backend: `/healthz` and `/readyz` - Frontend: can it load? does it reach the API? -3. **Check auth (Clerk) configuration** - - Frontend: is Clerk enabled unexpectedly? (publishable key set) - - Backend: is `CLERK_JWKS_URL` configured correctly? +3. **Check auth (Clerk)** + - Frontend: did Clerk get enabled unintentionally? (publishable key set) + - Backend: is `CLERK_SECRET_KEY` configured correctly? 4. **Check DB connectivity** - Can backend connect to Postgres (`DATABASE_URL`)? 5. **Check logs** - Backend logs for 5xx spikes or auth failures. - - Frontend logs for proxy/API URL misconfig. + - Frontend logs for API URL/proxy misconfig. 6. **Stabilize** - - Roll back the last change if available. + - Roll back the last change if you can. - Temporarily disable optional integrations (gateway) to isolate. -## Backups / restore (placeholder) -- Define backup cadence and restore steps once production deployment is finalized. +## Backups / restore + +See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup. diff --git a/docs/10-troubleshooting.md b/docs/10-troubleshooting.md index 8f13590e..c3f59459 100644 --- a/docs/10-troubleshooting.md +++ b/docs/10-troubleshooting.md @@ -4,19 +4,19 @@ - [Troubleshooting deep dive](troubleshooting/README.md) -This is the high-level troubleshooting entry point (minimal churn). +This is the “quick triage” page. For detailed playbooks and diagnostics, use the deep dive. ## Quick triage -### Symptom: frontend loads but shows API errors -- Confirm `NEXT_PUBLIC_API_URL` points to a reachable backend. +### Frontend loads but shows API errors +- Confirm `NEXT_PUBLIC_API_URL` points to a backend your browser can reach. - Check backend `/healthz`. -### Symptom: frontend keeps redirecting / Clerk errors -- If you are running locally without Clerk, ensure `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` is **unset/blank**. +### Frontend keeps redirecting / Clerk errors +- If you’re running locally without Clerk, keep `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` unset/blank. - See: [repo README Clerk note](../README.md#note-on-auth-clerk). -### Symptom: backend 5xx +### Backend returns 5xx - Check DB connectivity (`DATABASE_URL`) and migrations. - Check backend logs.