How is split brain different from simply having a "version too old"?

Split brain means a newer OpenClaw has written the config and stamped meta.lastTouchedVersion and related fields, but gateway install/restart still invokes an older binary; upgrading only the npm package without reinstalling service metadata yields a half-working state where doctor can surface conflicts but gateway subcommands refuse destructive steps.

When should I consider destructive environment variables such as OPENCLAW_ALLOW_OLDER_BINARY?

Only in emergency recovery or intentional downgrade scenarios described in official troubleshooting, when you explicitly accept possibly corrupting service metadata; the normal upgrade path should fix PATH, reinstall the gateway service from the same install source, then run doctor again.

After an upgrade, channels look fine but cron does not fire—which post should I read first?

Rule out split brain first, then follow the combined order for built-in cron and gateway lifecycle; see our on-site openclaw cron production guide versus remote-mode triage depending on topology.

2026 OpenClaw upgrade troubleshooting: binary split brain, PATH alignment, and gateway install --force / doctor recovery

Seven implicit assumptions that make “the upgrade” the scapegoat in postmortems

Official troubleshooting explains that when a newer OpenClaw writes openclaw.json and updates stamps such as meta.lastTouchedVersion, yet your shell still resolves an old openclaw binary on PATH, read paths may keep working—but once gateway service install/restart/remove or similar destructive mutations are involved, the CLI refuses rather than persist half-new metadata to disk—commonly spoken of as split brain in production.

01
Equating successful npm installs with service switching to the new binary: npm install -g updates executables under the global prefix only; if launchd or systemd --user units still point at old absolute paths, restarts behave the same as before.
02
Mixing login-shell PATH with the service environment: interactive which openclaw succeeding does not mean the daemon environment matches.
03
Ignoring multiple installers side by side: Homebrew, official installers, and npm global can each ship a binary; order depends on PATH prefix precedence.
04
Skipping gateway install --force after upgrade: official guidance reinstalls service packaging when binaries drift; relying on manually starting gateway once leaves the fork ready to recur on next reboot.
05
Treating every doctor failure as malformed config: sometimes guardrails detect a binary vs config mismatch; align versions before rewriting keys.
06
Switching blindly between remote and local mode without pinning config snapshots: follow the discipline in our remote mode guide: run openclaw config get gateway.mode before deciding where probes should land.
07
Checking channels after upgrades but ignoring scheduling surfaces: cron and Gateway share lifecycle; regressions belong on the checklist in our cron article.

The shared flaw is confusing “config still readable” with “execution plane coherent.” The right mental model: a config stamp reflects who last wrote the file; which binary runs the service demands its own proof.

Symptom matrix: split brain, auth drift, ports, and remote URL cross-wiring

Use the table below to move on-call notes from “feels like the upgrade broke” to a signed branch:

Signal	More like split brain	More like auth/session	More like remote URL / topology
Doctor keywords	Mentions forked old/new binaries; blocks destructive gateway actions	Token/device error codes unrelated to binary version	RPC probe failures while local `gateway status --deep` points at an unintended host
gateway status	Runtime behavior disagrees sharply with CLI `--version`	Runtime OK but unauthorized	Stopped locally while the remote actually runs Gateway
First action	Align PATH → `gateway install --force` → restart	Rotate or realign tokens/device handshakes	Verify `gateway.remote.url` and environment variables stay consistent with our remote mode guide

Upgrade-night golden questions: (A) Which binary runs? (B) Which stamp wrote the config—and when? Align those before debating channels versus cron.

When you combine Tailscale or private tunnels with deployments, never conflate tunnel reachability with healthy RPC—still acceptance-test both legs using our post on Tailscale private exposure.

Six-step recovery runbook (follow in order—avoid improvised one-offs)

The sequence below is order-sensitive; if any step finds old vs new still diverging, rewind to the previous step rather than editing config while swapping binaries ad hoc.

Freeze evidence: capture openclaw --version, explicit binary paths surfaced in unit files where visible, and doctor screenshots.

Fix PATH and aliases: ensure non-interactive which openclaw resolves the intended upgraded build; remove aliases that hide real paths.

Pick one install channel: choose a single durable feed (documented npm, installer, and so on) and avoid long-term brew vs npm mixing.

Reinstall service packaging: after PATH is correct, run openclaw gateway install --force as the same user to refresh launchd/systemd metadata.

Cold-start Gateway: openclaw gateway restart, then gateway status and RPC probes.

Regression passes: openclaw doctor → channels status --probe → confirm cron list still registers expected jobs.

bash · diagnostic order (example)

openclaw --version
command -v openclaw
openclaw gateway status
openclaw doctor
openclaw gateway install --force
openclaw gateway restart
openclaw channels status --probe

info

Note: When logs mention port clashes, memory spikes, or compose startup sequencing, pair this with our Gateway not ready playbook and closed(1000) RPC so resource-class failures are not mistaken for split brain.

Destructive-environment gates—when letting an old binary write would ever be acceptable

Official troubleshooting treats “newer config + older binary” as dangerous: stale processes gaining rights to mutate gateway packaging can leave disk state unrecoverably mixed. Newer builds may expose hard gates on destructive gateway operations—specific OPENCLAW_* variables (exact names depend on current docs) belong only where you knowingly need legacy binaries for a single emergency repair.

warning

Caution: these knobs are not a universal bypass—they cover narrow scenarios where you fully understand risks and accept possible damage to service metadata. Defaults should remain unset unless a change ticket cites rollback documented by upstream.

Maintainable engineering usually means fix PATH → reinstall service → finish upgrade under the fresh binary; only rare cases such as blocked package downloads rationalize pinning a downgrade ticket with audited artifacts.

Three evidence artefacts you can place on a change record (and convergence)

Operational anchors teams can quantify internally:

Version dual-signed evidence: keep screenshots of openclaw --version from both interactive shells and systemd/launchd units; they must match after recovery.

Destructive-action window: if official emergency env vars are unavoidable, the ticket records duration, operator, rollback commands, and post-run doctor cleanup.

Business regression: watch at least one full cron cycle plus one human round-trip after upgrade before lifting maintenance; align log retention with our production observability guide.

Gateways on laptops or shared dev machines fight sleep, OS updates, and multi-user friction; parking OpenClaw on a dedicated remote Mac with 7×24 uptime, SSH access, and disk/network terms in contract often beats repeated split-brain upgrades. NodeMini cloud Mac Mini rental offers fixed SSH and dedicated compute suited to AI gateways and internal automation; specs and onboarding live in Mac Mini rental rates and the help center. More OpenClaw walkthroughs: filter the blog by OpenClaw and read observability → cron → remote mode → this upgrade split-brain guide.

OpenClaw troubleshooting after a 2026 upgrade Split brain, PATH, gateway install --force, and doctor in one pass

Seven implicit assumptions that make “the upgrade” the scapegoat in postmortems

Symptom matrix: split brain, auth drift, ports, and remote URL cross-wiring

Six-step recovery runbook (follow in order—avoid improvised one-offs)

Destructive-environment gates—when letting an old binary write would ever be acceptable

Three evidence artefacts you can place on a change record (and convergence)

Common questions