After upgrading OpenClaw, which kind of “bad” are you seeing? gateway failing to start, RPC probes failing, or the CLI reporting a new build while a user-level systemd service still points at an old path? This article is for production readers: first use seven implicit assumptions to expose split brain (old vs new binaries), then a symptom matrix to separate “config stamped by version” issues from token or remote-URL problems, then a six-step safe recovery runbook (PATH → gateway install --force → gateway restart → doctor) with guidance on when a destructive-environment gate is warranted. Inline links branch to our posts on cron and upgrade regressions, remote mode, and production observability.
Official troubleshooting explains that when a newer OpenClaw writes openclaw.json and updates stamps such as meta.lastTouchedVersion, yet your shell still resolves an old openclaw binary on PATH, read paths may keep working—but once gateway service install/restart/remove or similar destructive mutations are involved, the CLI refuses rather than persist half-new metadata to disk—commonly spoken of as split brain in production.
Equating successful npm installs with service switching to the new binary: npm install -g updates executables under the global prefix only; if launchd or systemd --user units still point at old absolute paths, restarts behave the same as before.
Mixing login-shell PATH with the service environment: interactive which openclaw succeeding does not mean the daemon environment matches.
Ignoring multiple installers side by side: Homebrew, official installers, and npm global can each ship a binary; order depends on PATH prefix precedence.
Skipping gateway install --force after upgrade: official guidance reinstalls service packaging when binaries drift; relying on manually starting gateway once leaves the fork ready to recur on next reboot.
Treating every doctor failure as malformed config: sometimes guardrails detect a binary vs config mismatch; align versions before rewriting keys.
Switching blindly between remote and local mode without pinning config snapshots: follow the discipline in our remote mode guide: run openclaw config get gateway.mode before deciding where probes should land.
Checking channels after upgrades but ignoring scheduling surfaces: cron and Gateway share lifecycle; regressions belong on the checklist in our cron article.
The shared flaw is confusing “config still readable” with “execution plane coherent.” The right mental model: a config stamp reflects who last wrote the file; which binary runs the service demands its own proof.
Use the table below to move on-call notes from “feels like the upgrade broke” to a signed branch:
| Signal | More like split brain | More like auth/session | More like remote URL / topology |
|---|---|---|---|
| Doctor keywords | Mentions forked old/new binaries; blocks destructive gateway actions | Token/device error codes unrelated to binary version | RPC probe failures while local gateway status --deep points at an unintended host |
| gateway status | Runtime behavior disagrees sharply with CLI --version | Runtime OK but unauthorized | Stopped locally while the remote actually runs Gateway |
| First action | Align PATH → gateway install --force → restart | Rotate or realign tokens/device handshakes | Verify gateway.remote.url and environment variables stay consistent with our remote mode guide |
Upgrade-night golden questions: (A) Which binary runs? (B) Which stamp wrote the config—and when? Align those before debating channels versus cron.
When you combine Tailscale or private tunnels with deployments, never conflate tunnel reachability with healthy RPC—still acceptance-test both legs using our post on Tailscale private exposure.
The sequence below is order-sensitive; if any step finds old vs new still diverging, rewind to the previous step rather than editing config while swapping binaries ad hoc.
Freeze evidence: capture openclaw --version, explicit binary paths surfaced in unit files where visible, and doctor screenshots.
Fix PATH and aliases: ensure non-interactive which openclaw resolves the intended upgraded build; remove aliases that hide real paths.
Pick one install channel: choose a single durable feed (documented npm, installer, and so on) and avoid long-term brew vs npm mixing.
Reinstall service packaging: after PATH is correct, run openclaw gateway install --force as the same user to refresh launchd/systemd metadata.
Cold-start Gateway: openclaw gateway restart, then gateway status and RPC probes.
Regression passes: openclaw doctor → channels status --probe → confirm cron list still registers expected jobs.
openclaw --version command -v openclaw openclaw gateway status openclaw doctor openclaw gateway install --force openclaw gateway restart openclaw channels status --probe
Note: When logs mention port clashes, memory spikes, or compose startup sequencing, pair this with our Gateway not ready playbook and closed(1000) RPC so resource-class failures are not mistaken for split brain.
Official troubleshooting treats “newer config + older binary” as dangerous: stale processes gaining rights to mutate gateway packaging can leave disk state unrecoverably mixed. Newer builds may expose hard gates on destructive gateway operations—specific OPENCLAW_* variables (exact names depend on current docs) belong only where you knowingly need legacy binaries for a single emergency repair.
Caution: these knobs are not a universal bypass—they cover narrow scenarios where you fully understand risks and accept possible damage to service metadata. Defaults should remain unset unless a change ticket cites rollback documented by upstream.
Maintainable engineering usually means fix PATH → reinstall service → finish upgrade under the fresh binary; only rare cases such as blocked package downloads rationalize pinning a downgrade ticket with audited artifacts.
Operational anchors teams can quantify internally:
openclaw --version from both interactive shells and systemd/launchd units; they must match after recovery.doctor cleanup.Gateways on laptops or shared dev machines fight sleep, OS updates, and multi-user friction; parking OpenClaw on a dedicated remote Mac with 7×24 uptime, SSH access, and disk/network terms in contract often beats repeated split-brain upgrades. NodeMini cloud Mac Mini rental offers fixed SSH and dedicated compute suited to AI gateways and internal automation; specs and onboarding live in Mac Mini rental rates and the help center. More OpenClaw walkthroughs: filter the blog by OpenClaw and read observability → cron → remote mode → this upgrade split-brain guide.
Treat it as the stamp of which OpenClaw build last finished writing the config; if PATH still resolves an older binary, guardrails may refuse to keep mutating service metadata. Fix PATH, follow the runbook to reinstall services, then revisit doctor—usually before hand-editing JSON line by line.
Until split brain is ruled out, cron lists can look healthy while execution still refuses; finish section three here first, then follow our cron guide for cycle validation. More posts: OpenClaw filter.
Compare gateway.mode, gateway.remote.url, and gateway status on both ends; details in remote mode triage. For capacity decisions start with Mac Mini rental rates.