docs: add missing context to overview/architecture/ops/troubleshooting
This commit is contained in:
@@ -1,23 +1,45 @@
|
|||||||
# Overview
|
# Overview
|
||||||
|
|
||||||
Mission Control is the **web UI + HTTP API** for operating OpenClaw. It’s where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
|
Mission Control is the **web UI + HTTP API** for operating OpenClaw.
|
||||||
|
|
||||||
|
It’s where you manage **boards**, **tasks**, **agents**, **approvals**, and (optionally) **gateway connections**.
|
||||||
|
|
||||||
## Problem statement
|
## Problem statement
|
||||||
- Provide a single place to coordinate work (boards/tasks) and execute automation (agents) safely.
|
|
||||||
|
|
||||||
## Non-goals (first pass)
|
OpenClaw can execute work (tools/skills) and converse across channels, but real operations need a place to:
|
||||||
- Not a general-purpose project management suite.
|
|
||||||
- Not a full observability platform.
|
- **Coordinate** work across people + agents (what’s next, what’s blocked, who owns what)
|
||||||
|
- **Track evidence** of what happened (commands run, links, logs, artifacts)
|
||||||
|
- **Control risk** (approvals, guardrails, isolation)
|
||||||
|
- **Operate reliably** (deployment, configuration, troubleshooting)
|
||||||
|
|
||||||
|
Mission Control provides that control plane.
|
||||||
|
|
||||||
|
## Who uses it
|
||||||
|
|
||||||
|
- **Maintainers / operators**: keep Mission Control + gateways healthy, deploy upgrades, respond to incidents.
|
||||||
|
- **Contributors**: develop backend/frontend changes, run tests, ship docs.
|
||||||
|
- **Automation authors**: define agent identities, skills, and task workflows.
|
||||||
|
|
||||||
## Key concepts (glossary-lite)
|
## Key concepts (glossary-lite)
|
||||||
|
|
||||||
- **Board**: a workspace containing tasks, memory, and agents.
|
- **Board**: a workspace containing tasks, memory, and agents.
|
||||||
- **Task**: unit of work on a board; has status and comments.
|
- **Task**: a unit of work on a board (status + comments/evidence).
|
||||||
- **Agent**: an automated worker that can execute tasks and post evidence.
|
- **Agent**: an automated worker that can execute tasks and post evidence.
|
||||||
- **Gateway**: OpenClaw runtime host that executes tools/skills and runs heartbeats/cron.
|
- **Approval**: a structured “allow/deny” checkpoint for risky actions.
|
||||||
- **Heartbeat**: periodic agent check-in loop for doing incremental work.
|
- **Gateway**: the OpenClaw runtime host that executes tools/skills and runs heartbeats/cron.
|
||||||
- **Cron job**: scheduled execution (recurring or one-shot) isolated from conversational context.
|
- **Heartbeat**: periodic agent check-in loop for incremental work.
|
||||||
|
- **Cron job**: scheduled execution (recurring or one-shot), often isolated from conversational context.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Not a general-purpose project management suite (we optimize for AI-assisted operations, not every PM feature).
|
||||||
|
- Not a full observability platform (we integrate with logs/metrics rather than replacing them).
|
||||||
|
- Not a secrets manager (we reference secret sources; don’t store secrets in docs/tasks/comments).
|
||||||
|
|
||||||
## Where to go next
|
## Where to go next
|
||||||
|
|
||||||
- Want it running? → [Quickstart](02-quickstart.md)
|
- Want it running? → [Quickstart](02-quickstart.md)
|
||||||
- Want to contribute? → [Development](03-development.md)
|
- Want to contribute? → [Development](03-development.md)
|
||||||
- Want to understand internals? → [Architecture](05-architecture.md)
|
- Want to understand internals? → [Architecture](05-architecture.md)
|
||||||
|
- Operating it? → [Ops / runbooks](09-ops-runbooks.md)
|
||||||
|
|||||||
@@ -7,18 +7,21 @@
|
|||||||
|
|
||||||
Mission Control is the **web UI + HTTP API** for operating OpenClaw. It’s where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
|
Mission Control is the **web UI + HTTP API** for operating OpenClaw. It’s where you manage boards, tasks, agents, approvals, and (optionally) gateway connections.
|
||||||
|
|
||||||
> Auth note: **Clerk is required for now** (current product direction). The codebase includes gating so CI/local can run with placeholders, but real deployments should configure Clerk.
|
> Auth note: **Clerk is required for production**. The codebase includes gating so CI/local can run without “real” keys, but real deployments should configure Clerk.
|
||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
||||||
- **Frontend**: Next.js app used by humans
|
- **Frontend**: Next.js app used by humans
|
||||||
- Location: `frontend/`
|
- Location: `frontend/`
|
||||||
- Routes/pages: `frontend/src/app/*` (Next.js App Router)
|
- Routes/pages: `frontend/src/app/*` (Next.js App Router)
|
||||||
|
- API client: generated + custom fetch (see `frontend/src/api/*`, `frontend/src/lib/api-base.ts`)
|
||||||
- **Backend**: FastAPI service exposing REST endpoints
|
- **Backend**: FastAPI service exposing REST endpoints
|
||||||
- Location: `backend/`
|
- Location: `backend/`
|
||||||
- App wiring: `backend/app/main.py`
|
- Entrypoint: `backend/app/main.py`
|
||||||
- API prefix: `/api/v1/*`
|
- API prefix: `/api/v1/*`
|
||||||
- **Database**: Postgres (from `compose.yml`)
|
- **Database**: Postgres (see `compose.yml`)
|
||||||
|
- **Gateway integration (optional)**: backend may call into OpenClaw Gateways over WebSockets
|
||||||
|
- Client/protocol list: `backend/app/services/openclaw/gateway_rpc.py`
|
||||||
|
|
||||||
## Diagram (conceptual)
|
## Diagram (conceptual)
|
||||||
|
|
||||||
@@ -33,24 +36,58 @@ flowchart LR
|
|||||||
GW --> OC[OpenClaw runtime]
|
GW --> OC[OpenClaw runtime]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Key request/data flows
|
## How requests flow
|
||||||
|
|
||||||
### UI → API
|
### 1) A human uses the UI
|
||||||
1. Browser loads the Next.js frontend.
|
|
||||||
2. Frontend calls backend endpoints under `/api/v1/*`.
|
|
||||||
3. Backend reads/writes Postgres.
|
|
||||||
|
|
||||||
### Auth (Clerk)
|
1. Browser loads the Next.js frontend (`frontend/`).
|
||||||
- Frontend enables Clerk when a publishable key is present/valid.
|
2. Frontend calls backend endpoints using `NEXT_PUBLIC_API_URL`.
|
||||||
- Backend verifies Clerk JWTs using **`CLERK_JWKS_URL`**.
|
3. Backend routes under `/api/v1/*` (`backend/app/main.py`) and reads/writes Postgres.
|
||||||
|
|
||||||
See also:
|
Common UI-driven data shapes:
|
||||||
- Frontend auth gating: `frontend/src/auth/*` (notably `frontend/src/auth/clerkKey.ts`).
|
- “boards/tasks” views → board/task CRUD + streams.
|
||||||
- Backend auth: `backend/app/core/auth.py`.
|
- “activity feed” → activity/events endpoints.
|
||||||
|
|
||||||
### Agent access (X-Agent-Token)
|
### 2) Authentication (Clerk)
|
||||||
Automation/agents can use the “agent API surface”:
|
|
||||||
- Endpoints under `/api/v1/agent/*`.
|
|
||||||
- Auth via `X-Agent-Token`.
|
|
||||||
|
|
||||||
See: `backend/app/api/agent.py`, `backend/app/core/agent_auth.py`.
|
- **Frontend**: Clerk is enabled only when a publishable key is present/valid.
|
||||||
|
- Gating/wrappers: `frontend/src/auth/clerkKey.ts`, `frontend/src/auth/clerk.tsx`.
|
||||||
|
- **Frontend → backend**: API calls attach `Authorization: Bearer <token>` when available.
|
||||||
|
- Token injection: `frontend/src/api/mutator.ts` (uses `window.Clerk.session.getToken()`).
|
||||||
|
- **Backend**: validates inbound auth and resolves a user context.
|
||||||
|
- Implementation: `backend/app/core/auth.py` (uses `clerk_backend_api` SDK with `CLERK_SECRET_KEY`).
|
||||||
|
|
||||||
|
### 3) Agent automation surface (`/api/v1/agent/*`)
|
||||||
|
|
||||||
|
Agents can call a dedicated API surface:
|
||||||
|
|
||||||
|
- Router: `backend/app/api/agent.py` (prefix `/agent` → mounted under `/api/v1/agent/*`).
|
||||||
|
- Authentication: `X-Agent-Token` header (or agent-only Authorization bearer parsing).
|
||||||
|
- Implementation: `backend/app/core/agent_auth.py`.
|
||||||
|
|
||||||
|
Typical agent flows:
|
||||||
|
- Heartbeat/presence updates
|
||||||
|
- Task comment posting (evidence)
|
||||||
|
- Board memory updates
|
||||||
|
- Lead coordination actions (if board-lead agent)
|
||||||
|
|
||||||
|
### 4) Streaming/feeds (server-sent events)
|
||||||
|
|
||||||
|
Some endpoints support streaming via SSE (`text/event-stream`).
|
||||||
|
Notes:
|
||||||
|
- Uses `sse-starlette` in backend routes (e.g. task/activity/memory routers).
|
||||||
|
|
||||||
|
### 5) Gateway integration (optional)
|
||||||
|
|
||||||
|
Mission Control can coordinate with OpenClaw Gateways over WebSockets.
|
||||||
|
|
||||||
|
- Protocol methods/events list: `backend/app/services/openclaw/gateway_rpc.py`.
|
||||||
|
- Operator-facing protocol docs: `docs/openclaw_gateway_ws.md`.
|
||||||
|
|
||||||
|
## Where to start reading code
|
||||||
|
|
||||||
|
- Backend entrypoint + router wiring: `backend/app/main.py`
|
||||||
|
- Auth dependencies + access enforcement: `backend/app/api/deps.py`
|
||||||
|
- User auth: `backend/app/core/auth.py`
|
||||||
|
- Agent auth: `backend/app/core/agent_auth.py`
|
||||||
|
- Agent API surface: `backend/app/api/agent.py`
|
||||||
|
|||||||
@@ -6,32 +6,33 @@
|
|||||||
- [Production](production/README.md)
|
- [Production](production/README.md)
|
||||||
- [Troubleshooting](troubleshooting/README.md)
|
- [Troubleshooting](troubleshooting/README.md)
|
||||||
|
|
||||||
This page is the operator/SRE entry point. It intentionally links to existing deeper docs to minimize churn.
|
This page is the operator entrypoint. It points to the existing deep-dive runbooks and adds a short “first 30 minutes” checklist.
|
||||||
|
|
||||||
## “First 30 minutes” incident checklist
|
## First 30 minutes (incident checklist)
|
||||||
|
|
||||||
1. **Confirm user impact + scope**
|
1. **Confirm impact**
|
||||||
- What is broken: UI, API, auth, or gateway integration?
|
- What’s broken: UI, API, auth, or gateway integration?
|
||||||
- Is it all users or a subset?
|
- All users or a subset?
|
||||||
|
|
||||||
2. **Check service health**
|
2. **Check service health**
|
||||||
- Backend: `/healthz` and `/readyz`
|
- Backend: `/healthz` and `/readyz`
|
||||||
- Frontend: can it load? does it reach the API?
|
- Frontend: can it load? does it reach the API?
|
||||||
|
|
||||||
3. **Check auth (Clerk) configuration**
|
3. **Check auth (Clerk)**
|
||||||
- Frontend: is Clerk enabled unexpectedly? (publishable key set)
|
- Frontend: did Clerk get enabled unintentionally? (publishable key set)
|
||||||
- Backend: is `CLERK_JWKS_URL` configured correctly?
|
- Backend: is `CLERK_SECRET_KEY` configured correctly?
|
||||||
|
|
||||||
4. **Check DB connectivity**
|
4. **Check DB connectivity**
|
||||||
- Can backend connect to Postgres (`DATABASE_URL`)?
|
- Can backend connect to Postgres (`DATABASE_URL`)?
|
||||||
|
|
||||||
5. **Check logs**
|
5. **Check logs**
|
||||||
- Backend logs for 5xx spikes or auth failures.
|
- Backend logs for 5xx spikes or auth failures.
|
||||||
- Frontend logs for proxy/API URL misconfig.
|
- Frontend logs for API URL/proxy misconfig.
|
||||||
|
|
||||||
6. **Stabilize**
|
6. **Stabilize**
|
||||||
- Roll back the last change if available.
|
- Roll back the last change if you can.
|
||||||
- Temporarily disable optional integrations (gateway) to isolate.
|
- Temporarily disable optional integrations (gateway) to isolate.
|
||||||
|
|
||||||
## Backups / restore (placeholder)
|
## Backups / restore
|
||||||
- Define backup cadence and restore steps once production deployment is finalized.
|
|
||||||
|
See [Production](production/README.md). If you run Mission Control in production, treat backup/restore as a regular drill, not a one-time setup.
|
||||||
|
|||||||
@@ -4,19 +4,19 @@
|
|||||||
|
|
||||||
- [Troubleshooting deep dive](troubleshooting/README.md)
|
- [Troubleshooting deep dive](troubleshooting/README.md)
|
||||||
|
|
||||||
This is the high-level troubleshooting entry point (minimal churn).
|
This is the “quick triage” page. For detailed playbooks and diagnostics, use the deep dive.
|
||||||
|
|
||||||
## Quick triage
|
## Quick triage
|
||||||
|
|
||||||
### Symptom: frontend loads but shows API errors
|
### Frontend loads but shows API errors
|
||||||
- Confirm `NEXT_PUBLIC_API_URL` points to a reachable backend.
|
- Confirm `NEXT_PUBLIC_API_URL` points to a backend your browser can reach.
|
||||||
- Check backend `/healthz`.
|
- Check backend `/healthz`.
|
||||||
|
|
||||||
### Symptom: frontend keeps redirecting / Clerk errors
|
### Frontend keeps redirecting / Clerk errors
|
||||||
- If you are running locally without Clerk, ensure `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` is **unset/blank**.
|
- If you’re running locally without Clerk, keep `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` unset/blank.
|
||||||
- See: [repo README Clerk note](../README.md#note-on-auth-clerk).
|
- See: [repo README Clerk note](../README.md#note-on-auth-clerk).
|
||||||
|
|
||||||
### Symptom: backend 5xx
|
### Backend returns 5xx
|
||||||
- Check DB connectivity (`DATABASE_URL`) and migrations.
|
- Check DB connectivity (`DATABASE_URL`) and migrations.
|
||||||
- Check backend logs.
|
- Check backend logs.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user