OpenClaw has a common kind of false green: openclaw gateway status shows RPC checks as healthy, yet the client throws gateway closed (1000) or, after an upgrade, tools suddenly stop working. This article walks symptom → likely root cause → verification commands → fix, covering the three frequent failure classes—tokens, scopes, workspace paths, and model backends (including CLI-only routes that disable tools)—and gives a six-step recovery runbook plus the log lines that matter under systemd and Docker. Alongside the on-site production observability, security hardening, and cross-platform install articles, this one focuses on connection and session consistency: restore the path first, then hand long-term monitoring and change control to the observability guide.
Gateway control-plane probes often only answer whether the process is up and the port answers; client errors like gateway closed (1000) frequently follow a WebSocket or session closed by the server or auth and policy drift. Use the seven checks below on the front line: the more that apply, the less you should rely on refreshing the UI—run the ordered restart and config validation in section 3 instead.
Treating probe green as end-to-end green: RPC OK in status is a narrow check; device-class commands and tool execution channels can still fail when scopes are missing or the session expired.
Token drift: environment variables, config files, and the token loaded by the Gateway process are not the same copy; rotating secrets on only one side yields intermittent success and bulk failure.
Workspace path mismatch: when agents.defaults.workspace points at an old directory or container bind mounts are wrong, the tool layer may refuse work or disconnect quickly.
CLI-only model backends: some *-cli/... routes intentionally disable file-class tools, which looks like “Gateway online but tools unavailable” and is easy to confuse with closed(1000).
Dual process after upgrade: the package updated but an old Gateway still holds the port or PID files were not cleaned; the new process is half-started and probes hit the old listener.
Tightened security policy: after enabling dmPolicy / networkPolicy, the handshake may succeed and the first payload is dropped by policy; compare allowlists in the security hardening article.
No minimal repro bundle: tickets with half a line of error and no CLI version, config snippet, or recent change force tier-two guesswork and stretch recovery time.
The shared root cause is compressing multi-layer health in a distributed system into a single boolean. Next, a table maps what you see to the commands you should run first so you are not swimming blindly in logs.
Pin this table at the top of the on-call runbook: align on the exact string you see, then pick the shortest verification path. Exact subcommands depend on your OpenClaw build; the names below are illustrative of intent.
| What you see | Likely root cause | Checks to run first |
|---|---|---|
| RPC OK, but device or channel ops report closed(1000) | Session scope does not match the action, or token differs from the Gateway runtime | openclaw status --all; trace token sources; review allowlists in security config |
| After upgrade, “all tools grayed out” | Model routing on a CLI-only backend, or Gateway not restarted to load new config | openclaw models list; switch off CLI-only routes, then openclaw gateway restart |
| Intermittent success, bulk failure | Multiple terminals with different tokens, or a reverse proxy caching stale connections | Unify env exports from one shell; clear client sessions; check proxy idle timeout |
| Path-class tools refuse to run | Workspace config does not match the real repo path | Diff openclaw config get agents.defaults.workspace against disk |
| Disconnects right after policy change | dmPolicy / networkPolicy tightened; first packet rejected | Re-read the security hardening section; temporarily relax for a known session to validate |
Probe green only proves the control plane is alive; to prove you can work reliably you must align token, workspace, model backend, and policy.
For fuller logging and rollback cadence see the production observability article: here the goal is to decide in about ten minutes whether you are on “restart + validate” or “config rollback.”
The order deliberately places low-cost steps first and config rollback later, so you do not open firewalls or reinstall immediately. In production, note in the ticket whether impact is “this CLI only” or “multi-user sessions.”
Freeze concurrent work: ask teammates to pause new sessions and batch jobs so a Gateway restart is not drowned in reconnect storms.
Capture a state snapshot: run openclaw --version, openclaw status --all (if available), and save output; record recent token rotation or openclaw.json edits.
Validate workspace and model routing: confirm workspace points at a real directory; use openclaw models list to ensure you did not select a CLI-only backend by mistake.
Run doctor / validate: use the CLI’s openclaw doctor, config:validate, or equivalent to fix obvious mismatches.
Restart Gateway in order: openclaw gateway restart (or restart the systemd unit / container) so the old process exits before the new one listens.
Minimal acceptance tests: one read-only tool call and one write call; only then reopen for others. If it still fails, go to section 4 for system logs.
openclaw --version openclaw status --all 2>&1 | tee /tmp/openclaw-status.txt openclaw config get agents.defaults.workspace openclaw models list openclaw doctor openclaw gateway restart # Then run one minimal read tool call and one write call to verify session and scope
Note: when the Gateway runs on a dedicated remote Mac, long SSH sessions and GUI prompts can still interrupt the toolchain; for stable unattended execution, pair with the directory and session isolation checklist in the agent node article.
If section 3’s restart still yields closed(1000), suspect first a process that never exited or drifted bind mounts inside a container. As in the observability article: establish who is listening and which user started it before debating config.
systemd (bare-metal Linux): use systemctl status to see whether the main process is crash-looping; journalctl -u <unit> -n 200 --no-pager for close codes and policy keywords. Docker Compose: align timestamps with docker compose ps and docker compose logs --tail=200 gateway. If you deployed with the Linux systemd + Tunnel guide, also confirm the tunnel and loopback binding are not pointing at a stale port.
Warning: do not temporarily expose the Gateway to the public internet to “test connectivity” before you know which interface is listening; validate inside the constraints of the security hardening article so troubleshooting does not become an incident.
These fields shorten the second round of diagnosis; redact before sharing externally.
Running the Gateway only on a laptop is fragile to sleep, OS updates, and multi-user desktop sessions; a small Linux box often lacks the macOS toolchain and graphical edge cases you need. When OpenClaw must sit on a long-lived, contract-friendly execution tier, a dedicated remote Mac is usually steadier than repeatedly borrowing hardware. Compared to building your own Mac rack, NodeMini Mac Mini cloud rental makes it easier to define a repeatable node profile so “Gateway + toolchain” hands off like a VPS estate.
Probes cover a narrow path; session, scope, token, and model backend can still be out of sync. Follow section 3 for an ordered restart and run doctor/validate. To plan execution-tier capacity, see Mac Mini rental rates and the help center.
Check model routing for a CLI-only backend; confirm the Gateway restarted and loaded new config; then verify workspace paths and complete token rotation across CLI and service.
Open the blog OpenClaw category for install, systemd, Docker, security, and observability posts; cross-check connectivity baselines in the help center.