From efee334843af3a26cf293adb981dde91dfbc7ae1 Mon Sep 17 00:00:00 2001 From: Claude Thebot Date: Mon, 9 Mar 2026 22:25:31 -0700 Subject: [PATCH] feat: run at boot (systemd/launchd) and auth token re-sync docs - Add systemd unit examples and README for local install (Linux) - Extend deployment README with Run at boot (local install) and merge upstream - Add Re-syncing auth tokens subsection to gateway provisioning troubleshooting - install.sh: add --install-service to install systemd user units (Linux) - DOCUMENTATION.md: session notes Made-with: Cursor --- DOCUMENTATION.md | 33 ++++++++++ docs/deployment/README.md | 64 +++++++++++++++++++ docs/deployment/systemd/README.md | 58 +++++++++++++++++ .../openclaw-mission-control-backend.service | 23 +++++++ .../openclaw-mission-control-frontend.service | 23 +++++++ ...openclaw-mission-control-rq-worker.service | 24 +++++++ .../gateway-agent-provisioning.md | 26 ++++++++ install.sh | 52 +++++++++++++++ 8 files changed, 303 insertions(+) create mode 100644 DOCUMENTATION.md create mode 100644 docs/deployment/systemd/README.md create mode 100644 docs/deployment/systemd/openclaw-mission-control-backend.service create mode 100644 docs/deployment/systemd/openclaw-mission-control-frontend.service create mode 100644 docs/deployment/systemd/openclaw-mission-control-rq-worker.service diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md new file mode 100644 index 00000000..e9dcfce8 --- /dev/null +++ b/DOCUMENTATION.md @@ -0,0 +1,33 @@ +# Session documentation + +Decisions and changes made during development. + +## 2026-03-09: Run at boot and auth token re-sync + +### Goal + +- Allow Mission Control (local install, no Docker, e.g. in a VM) to run at boot via systemd (Linux) or launchd (macOS). +- Document how to re-sync auth tokens between Mission Control and OpenClaw when they have drifted. + +### Implemented + +1. **Systemd unit files** (`docs/deployment/systemd/`) + - Added example units: `openclaw-mission-control-backend.service`, `openclaw-mission-control-frontend.service`, `openclaw-mission-control-rq-worker.service`. + - Units use placeholders `REPO_ROOT`, `BACKEND_PORT`, `FRONTEND_PORT`; install instructions and a small README explain substitution and install to `~/.config/systemd/user/` or `/etc/systemd/system/`. + - RQ worker is required for gateway lifecycle and webhooks; it is a separate unit. + +2. **Deployment docs** (`docs/deployment/README.md`) + - Replaced placeholder with a short deployment guide. + - "Run at boot (local install)": Linux (systemd) with link to `systemd/README.md`; macOS (launchd) with example plist and `launchctl load`; Docker Compose note for `restart: unless-stopped`. + +3. **Troubleshooting** (`docs/troubleshooting/gateway-agent-provisioning.md`) + - New subsection "Re-syncing auth tokens when Mission Control and OpenClaw have drifted": when tokens drift, run template sync with `rotate_tokens=true` via API (curl) or CLI (`scripts/sync_gateway_templates.py --rotate-tokens`); after sync, wake/update gateway if needed. + +4. **install.sh** (`install.sh`) + - New optional flag `--install-service` (local mode only): on Linux, copies the three systemd unit files from `docs/deployment/systemd/`, substitutes `REPO_ROOT`/ports, installs to `$XDG_CONFIG_HOME/systemd/user` (or `~/.config/systemd/user`), runs `systemctl --user daemon-reload` and `systemctl --user enable`. On non-Linux, prints a note pointing to `docs/deployment/README.md` for launchd. Not prompted by default; only when the user passes `--install-service`. + +### Rationale + +- **No Docker in VM**: User runs local install in a VM and does not want Docker there; run-at-boot is provided by the OS (systemd/launchd). +- **Units as examples**: Units are in `docs/deployment/systemd/` so they can be versioned and copied; install.sh only installs when `--install-service` is given to avoid touching system/LaunchAgents without explicit opt-in. +- **Auth re-sync**: Token drift is a common failure mode; documenting the API and CLI with `rotate_tokens=true` in the provisioning troubleshooting doc makes recovery easy to find. diff --git a/docs/deployment/README.md b/docs/deployment/README.md index 63b96947..951adce9 100644 --- a/docs/deployment/README.md +++ b/docs/deployment/README.md @@ -50,6 +50,8 @@ Open: - Frontend: `http://localhost:${FRONTEND_PORT:-3000}` - Backend health: `http://localhost:${BACKEND_PORT:-8000}/healthz` +To have containers restart on failure and after host reboot, add `restart: unless-stopped` to the `db`, `redis`, `backend`, and `frontend` services in `compose.yml`, and ensure Docker is configured to start at boot. + ### 3) Verify ```bash @@ -112,3 +114,65 @@ Typical setup (outline): - Ensure the frontend can reach the backend over the configured `NEXT_PUBLIC_API_URL` This section is intentionally minimal until we standardize a recommended proxy (Caddy/Nginx/Traefik). + +## Run at boot (local install) + +If you installed Mission Control **without Docker** (e.g. using `install.sh` with "local" mode, or inside a VM where Docker is not used), the installer does not configure run-at-boot. You can start the stack after each reboot manually, or configure the OS to start it for you. + +### Linux (systemd) + +Use the example systemd units and instructions in [systemd/README.md](./systemd/README.md). In short: + +1. Copy the unit files from `docs/deployment/systemd/` and replace `REPO_ROOT`, `BACKEND_PORT`, and `FRONTEND_PORT` with your paths and ports. +2. Install the units under `~/.config/systemd/user/` (user) or `/etc/systemd/system/` (system). +3. Enable and start the backend, frontend, and RQ worker services. + +The RQ queue worker is required for gateway lifecycle (wake/check-in) and webhook delivery; run it as a separate unit. + +### macOS (launchd) + +Use LaunchAgents so the backend, frontend, and worker run under your user and restart on failure. + +1. Create a plist for each process under `~/Library/LaunchAgents/`, e.g. `com.openclaw.mission-control.backend.plist`: + +```xml + + + + + Label + com.openclaw.mission-control.backend + ProgramArguments + + /usr/bin/env + uv + run + uvicorn + app.main:app + --host + 0.0.0.0 + --port + 8000 + + WorkingDirectory + REPO_ROOT/backend + EnvironmentVariables + + PATH + /usr/local/bin:/opt/homebrew/bin:REPO_ROOT/backend/.venv/bin + + KeepAlive + + RunAtLoad + + + +``` + +Replace `REPO_ROOT` with the actual repo path. Ensure `uv` is on `PATH` (e.g. add `~/.local/bin` to the `PATH` in the plist). Load with: + +```bash +launchctl load ~/Library/LaunchAgents/com.openclaw.mission-control.backend.plist +``` + +2. Add similar plists for the frontend (`npm run start -- --hostname 0.0.0.0 --port 3000` in `REPO_ROOT/frontend`) and for the RQ worker (`uv run python ../scripts/rq worker` with `WorkingDirectory=REPO_ROOT/backend` and `ProgramArguments` pointing at `uv`, `run`, `python`, `../scripts/rq`, `worker`). diff --git a/docs/deployment/systemd/README.md b/docs/deployment/systemd/README.md new file mode 100644 index 00000000..07f83f3a --- /dev/null +++ b/docs/deployment/systemd/README.md @@ -0,0 +1,58 @@ +# Systemd unit files (local install, run at boot) + +Example systemd units for running Mission Control at boot when installed **without Docker** (e.g. local install in a VM). + +## Prerequisites + +- **Backend**: `uv`, Python 3.12+, and `backend/.env` configured (including `DATABASE_URL`, `RQ_REDIS_URL` if using the queue worker). +- **Frontend**: Node.js 22+ and `frontend/.env` (e.g. `NEXT_PUBLIC_API_URL`). +- **RQ worker**: Redis must be running and reachable; `backend/.env` must set `RQ_REDIS_URL` and `RQ_QUEUE_NAME` to match the backend API. + +If you use Docker only for Postgres and/or Redis, start those first (e.g. `docker compose up -d db` and optionally Redis) or add `After=docker.service` and start the stack via a separate unit or script. + +## Placeholders + +Before installing, replace in each unit file: + +- `REPO_ROOT` — absolute path to the Mission Control repo (e.g. `/home/user/openclaw-mission-control`). +- `BACKEND_PORT` — backend port (default `8000`). +- `FRONTEND_PORT` — frontend port (default `3000`). + +Example (from repo root): + +```bash +REPO_ROOT="$(pwd)" +for f in docs/deployment/systemd/openclaw-mission-control-*.service; do + sed -e "s|REPO_ROOT|$REPO_ROOT|g" -e "s|BACKEND_PORT|8000|g" -e "s|FRONTEND_PORT|3000|g" "$f" \ + -o "$(basename "$f")" +done +# Then copy the generated .service files to ~/.config/systemd/user/ or /etc/systemd/system/ +``` + +## Install and enable + +**User units** (recommended for single-user / VM): + +```bash +cp openclaw-mission-control-backend.service openclaw-mission-control-frontend.service openclaw-mission-control-rq-worker.service ~/.config/systemd/user/ +systemctl --user daemon-reload +systemctl --user enable openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker +systemctl --user start openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker +``` + +**System-wide** (e.g. under `/etc/systemd/system/`): + +```bash +sudo cp openclaw-mission-control-*.service /etc/systemd/system/ +sudo systemctl daemon-reload +sudo systemctl enable --now openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker +``` + +## Order + +Start order is not strict between backend, frontend, and worker; all use `After=network-online.target`. Ensure Postgres (and Redis, if used) are running before or with the backend/worker (e.g. start Docker services first, or use system units for Postgres/Redis with the Mission Control units depending on them). + +## Logs + +- `journalctl --user -u openclaw-mission-control-backend -f` (or `sudo journalctl -u openclaw-mission-control-backend -f` for system units) +- Same for `openclaw-mission-control-frontend` and `openclaw-mission-control-rq-worker`. diff --git a/docs/deployment/systemd/openclaw-mission-control-backend.service b/docs/deployment/systemd/openclaw-mission-control-backend.service new file mode 100644 index 00000000..7ffcc5be --- /dev/null +++ b/docs/deployment/systemd/openclaw-mission-control-backend.service @@ -0,0 +1,23 @@ +# Mission Control backend (FastAPI) — example systemd unit for local install. +# Copy to ~/.config/systemd/user/ or /etc/systemd/system/, then: +# sed -e 's|REPO_ROOT|/path/to/openclaw-mission-control|g' -e 's|BACKEND_PORT|8000|g' -i openclaw-mission-control-backend.service +# systemctl --user daemon-reload # or sudo systemctl daemon-reload +# systemctl --user enable --now openclaw-mission-control-backend # or sudo systemctl enable --now ... +# +# Requires: uv in PATH (e.g. ~/.local/bin), backend/.env present. + +[Unit] +Description=Mission Control backend (FastAPI) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +WorkingDirectory=REPO_ROOT/backend +EnvironmentFile=-REPO_ROOT/backend/.env +ExecStart=uv run uvicorn app.main:app --host 0.0.0.0 --port BACKEND_PORT +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=default.target diff --git a/docs/deployment/systemd/openclaw-mission-control-frontend.service b/docs/deployment/systemd/openclaw-mission-control-frontend.service new file mode 100644 index 00000000..3830c303 --- /dev/null +++ b/docs/deployment/systemd/openclaw-mission-control-frontend.service @@ -0,0 +1,23 @@ +# Mission Control frontend (Next.js) — example systemd unit for local install. +# Copy to ~/.config/systemd/user/ or /etc/systemd/system/, then: +# sed -e 's|REPO_ROOT|/path/to/openclaw-mission-control|g' -e 's|FRONTEND_PORT|3000|g' -i openclaw-mission-control-frontend.service +# systemctl --user daemon-reload # or sudo systemctl daemon-reload +# systemctl --user enable --now openclaw-mission-control-frontend # or sudo systemctl enable --now ... +# +# Requires: Node.js/npm in PATH (e.g. from nvm or system install), frontend/.env present. + +[Unit] +Description=Mission Control frontend (Next.js) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +WorkingDirectory=REPO_ROOT/frontend +EnvironmentFile=-REPO_ROOT/frontend/.env +ExecStart=npm run start -- --hostname 0.0.0.0 --port FRONTEND_PORT +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=default.target diff --git a/docs/deployment/systemd/openclaw-mission-control-rq-worker.service b/docs/deployment/systemd/openclaw-mission-control-rq-worker.service new file mode 100644 index 00000000..90705e02 --- /dev/null +++ b/docs/deployment/systemd/openclaw-mission-control-rq-worker.service @@ -0,0 +1,24 @@ +# Mission Control RQ queue worker — example systemd unit for local install. +# Processes lifecycle and webhook queue tasks; required for gateway wake/check-in and webhooks. +# Copy to ~/.config/systemd/user/ or /etc/systemd/system/, then: +# sed -e 's|REPO_ROOT|/path/to/openclaw-mission-control|g' -i openclaw-mission-control-rq-worker.service +# systemctl --user daemon-reload # or sudo systemctl daemon-reload +# systemctl --user enable --now openclaw-mission-control-rq-worker # or sudo systemctl enable --now ... +# +# Requires: uv in PATH, Redis reachable (RQ_REDIS_URL in backend/.env), backend/.env present. + +[Unit] +Description=Mission Control RQ queue worker +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +WorkingDirectory=REPO_ROOT/backend +EnvironmentFile=-REPO_ROOT/backend/.env +ExecStart=uv run python ../scripts/rq worker +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=default.target diff --git a/docs/troubleshooting/gateway-agent-provisioning.md b/docs/troubleshooting/gateway-agent-provisioning.md index 695b35ab..1f39eac9 100644 --- a/docs/troubleshooting/gateway-agent-provisioning.md +++ b/docs/troubleshooting/gateway-agent-provisioning.md @@ -104,3 +104,29 @@ Actions: - gateway logs around bootstrap - worker logs around lifecycle events - agent `last_provision_error`, `wake_attempts`, `last_seen_at` + +## Re-syncing auth tokens when Mission Control and OpenClaw have drifted + +Mission Control stores a hash of each agent’s token and provisions OpenClaw by writing templates (e.g. `TOOLS.md`) that include `AUTH_TOKEN`. If the token on the gateway and the backend hash drift (e.g. after a reinstall, token change, or manual edit), heartbeats can fail with 401 and the agent may appear offline. + +To re-sync: + +1. Ensure Mission Control is running (API and queue worker). +2. Run **template sync with token rotation** so the backend issues new agent tokens and rewrites `AUTH_TOKEN` into the gateway’s agent files. + +**Via API (curl):** + +```bash +curl -X POST "http://localhost:8000/api/v1/gateways/GATEWAY_ID/templates/sync?rotate_tokens=true" \ + -H "Authorization: Bearer YOUR_LOCAL_AUTH_TOKEN" +``` + +Replace `GATEWAY_ID` (from the Gateways list or gateway URL in the UI) and `YOUR_LOCAL_AUTH_TOKEN` with your local auth token. + +**Via CLI (from repo root):** + +```bash +cd backend && uv run python scripts/sync_gateway_templates.py --gateway-id GATEWAY_ID --rotate-tokens +``` + +After a successful sync, OpenClaw agents will have new `AUTH_TOKEN` values in their workspace files; the next heartbeat or bootstrap will use the new token. If the gateway was offline, trigger a wake/update from Mission Control so agents restart and pick up the new token. diff --git a/install.sh b/install.sh index 7291a3ac..838b441d 100755 --- a/install.sh +++ b/install.sh @@ -30,6 +30,7 @@ FORCE_LOCAL_AUTH_TOKEN="" FORCE_DB_MODE="" FORCE_DATABASE_URL="" FORCE_START_SERVICES="" +FORCE_INSTALL_SERVICE="" if [[ -t 0 ]]; then INTERACTIVE=1 @@ -131,6 +132,7 @@ Options: --db-mode Local mode only --database-url Required when --db-mode external --start-services Local mode only + --install-service Local mode only: install systemd user units for run at boot (Linux) -h, --help If an option is omitted, the script prompts in interactive mode and uses defaults in non-interactive mode. @@ -220,6 +222,10 @@ parse_args() { FORCE_START_SERVICES="$2" shift 2 ;; + --install-service) + FORCE_INSTALL_SERVICE="yes" + shift + ;; -h|--help) usage exit 0 @@ -733,6 +739,45 @@ start_local_services() { ) } +install_systemd_services() { + local backend_port="$1" + local frontend_port="$2" + local systemd_user_dir + systemd_user_dir="${XDG_CONFIG_HOME:-$HOME/.config}/systemd/user" + local units_dir="$REPO_ROOT/docs/deployment/systemd" + + if [[ "$PLATFORM" != "linux" ]]; then + info "Skipping systemd install (not Linux). For macOS run-at-boot see docs/deployment/README.md (launchd)." + return 0 + fi + if [[ ! -d "$units_dir" ]]; then + warn "Systemd units dir not found: $units_dir" + return 1 + fi + for name in openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker; do + if [[ ! -f "$units_dir/$name.service" ]]; then + warn "Unit file not found: $units_dir/$name.service" + return 1 + fi + done + + mkdir -p "$systemd_user_dir" + for name in openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker; do + sed -e "s|REPO_ROOT|$REPO_ROOT|g" \ + -e "s|BACKEND_PORT|$backend_port|g" \ + -e "s|FRONTEND_PORT|$frontend_port|g" \ + "$units_dir/$name.service" > "$systemd_user_dir/$name.service" + info "Installed $systemd_user_dir/$name.service" + done + if command_exists systemctl; then + systemctl --user daemon-reload + systemctl --user enable openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker + info "Systemd user units enabled. Start with: systemctl --user start openclaw-mission-control-backend openclaw-mission-control-frontend openclaw-mission-control-rq-worker" + else + warn "systemctl not found; units were copied but not enabled." + fi +} + ensure_repo_layout() { [[ -f "$REPO_ROOT/Makefile" ]] || die "Missing Makefile in expected repository root: $REPO_ROOT" [[ -f "$REPO_ROOT/compose.yml" ]] || die "Missing compose.yml in expected repository root: $REPO_ROOT" @@ -954,6 +999,10 @@ SUMMARY wait_for_http "http://127.0.0.1:$frontend_port" "Frontend" 120 || true fi + if [[ -n "$FORCE_INSTALL_SERVICE" ]]; then + install_systemd_services "$backend_port" "$frontend_port" || true + fi + cat <