Onboard works, but the CLI or dashboard stays on Gateway not ready / not ready for a long time—that usually happens before the process is really listening, a different layer from the on-site gateway closed (1000) troubleshooting article (sessions, scopes, tokens). This post gives platform engineers the shortest path: clear ports, memory, timeouts, images, and volume permissions with a seven-item checklist, pick the right log surface with a bare-metal systemd vs Docker comparison table, then run a six-step runbook (including openclaw doctor and sample log commands), and read it alongside the install overview, Docker production, and observability guides.
not ready usually means the control plane has not yet seen a successful listen, dependent processes ready, or health probes passing; it is not the same as being kicked offline after a WebSocket is up by policy—use the closed (1000) article first for that. The seven items below are a first-week platform self-check after install.
Port held by an old process or another service: after upgrades or repeated docker compose up, an old Gateway on the host can still bind the same port; the new process half-starts and the CLI only shows not ready.
Insufficient memory and swap leading to OOM: on a small VPS pulling a model and the Gateway together, the Node process can be killed before readiness—logs often show exit code 137 or a silent disappearance.
Startup timeout too short: cold pulls or native builds exceed the default startupTimeout, health checks fail early, and you see “forever not ready.”
Reading the wrong log: when mixing systemd and Docker debugging, you tail the container but forget a host unit still launches an old binary—or the opposite.
Named volume permissions and UID mapping: the container user cannot write state; the Gateway crash-loops and the outside world only sees not ready.
Image digest vs config drift: compose points at :latest but cached layers disagree with new openclaw.json fields and the entry script exits early.
Mistaking network issues for startup failure: model or plugin registry timeouts leave the process stuck in init—separate DNS/egress from the Gateway itself.
What these share: the process never reaches a servable state, so status/RPC can look “half green.” Put them in your ticket template, then use the table below to pick a log surface and stop bouncing between journal and docker logs.
Aligned with cross-platform install: that guide covers “it installs”; this one covers “it will not come up.” Aligned with observability: that guide covers long-term metrics and upgrade/rollback; this one covers hard failures before first readiness.
If you run the Gateway on a dedicated remote Mac or a small Linux VPS, put minimum memory and disk headroom in change review, not only version pins—or not ready spikes right before demos. The table below turns “which log do I read?” into a decision.
Finally: do not temporarily expose the Gateway to the public internet to “test connectivity” before you know what is listening; follow least exposure in security hardening.
When debugging not ready, the first decision is entry shape: is the process started by systemd or by compose in a container? Mixing them wastes time.
| Dimension | Linux / macOS bare metal + systemd (or launchd) | Docker / Compose |
|---|---|---|
| First log screen | journalctl -u <unit> -n 200 --no-pager or the platform equivalent | docker compose logs --tail=200 <service>, aligned to restart timestamps |
| Port conflict checks | On the host, lsof -nP -iTCP:<port> -sTCP:LISTEN (illustrative) | Check listeners on both host and inside the container; published ports vs existing host services |
| Permissions / volumes | State directory ownership vs runtime user | Named volume UID, read-only root vs writable mounts; see Docker production |
| Timeouts and retries | Unit TimeoutStartSec, Restart= policy | Healthcheck start_period, retries, and image cold-start time |
| Upgrade / rollback hooks | Package version + last-known-good config backup path | Pinned image digest + versioned compose files; same spirit as the observability rollback section |
“Confirm listen, then talk sessions”: in the not ready phase the best ROI is ports + resources + the right log window, not changing tokens first.
If you deploy with Ubuntu 24.04 + systemd + Tunnel, also verify the tunnel upstream still points at the current port; a mismatch keeps the outside world not ready while curl 127.0.0.1 looks fine locally.
In Docker, a common misread is “container Exited but compose still looks stuck in creating”: use docker compose ps -a for exit codes, then return to volume permissions and the entry script.
Once the process is listening and probes pass, if clients still misbehave, switch to the session path in closed (1000).
The order below keeps cheap checks first: confirm resources and ports before you touch config and images. Exact subcommands depend on your OpenClaw build.
Freeze concurrent retries: pause teammates’ auto-reconnect scripts so a reconnect storm does not bury your investigation.
Capture version and resource snapshot: put openclaw --version, uname -a, and free memory/disk in the ticket.
Verify ports and stale processes: run a listen check on the configured port; if you find an old PID, stop and start in order per systemd/Docker docs.
Run doctor / validate: openclaw doctor or equivalent, fix obvious gaps, then retry startup.
Widen the startup observation window: where supported, temporarily raise gateway.startupTimeout or compose healthcheck start_period to separate “slow” from “dead.”
Minimal acceptance: after local curl health or the official status subcommand reports ready, restore other users; if it still fails, escalate with log samples from section 4.
openclaw --version openclaw doctor # systemd example: # journalctl -u openclaw-gateway -n 120 --no-pager # Docker example: # docker compose logs --tail=120 gateway # then restart the unit / compose service per distro docs
Note: if CLI logs show model or plugin download timeouts, separate egress from the Gateway itself; during a maintenance window you can temporarily widen egress allowlists, then tighten again—see security hardening.
Aligned with install overview: if you just changed Anthropic billing or API key shape, keep env vars and config files in one source of truth or the process can block parsing keys and look like not ready.
On a dedicated remote Mac running the Gateway long term, put launchd/unit restart policy and disk cleanup in the same runbook so full disks do not cause a loop of not ready.
Pin this “plain-language description → action” map in on-call chat to cut noise. If symptoms already involve WebSocket close codes, switch to closed (1000).
Port already in use: stop the old process before starting the new one; do not keep scaling without freeing the port. Out of memory: lower concurrency or add swap/upgrade before tuning knobs. Image pull failed: run docker pull or fix registry credentials before touching Gateway config.
Warning: do not run docker system prune-style cleanup in production without a paper trail; you can delete named volume backups still in use. Put cleanup paths in the change record.
With observability: after not ready is fixed, tag the root cause on your dashboard (port/memory/image/volume) so the same class does not reopen next week.
If the Gateway runs with MCP child processes on the same host, startup must also cover child binary paths and sandbox mounts visible inside the container—or the parent blocks in tool discovery.
Use these fields to align across teams; redact before sharing externally.
Running the Gateway only on a personal laptop is fragile to sleep and OS updates; a tiny-memory VPS often OOMs during cold start. When you need a long-lived, contract-friendly macOS execution tier for OpenClaw and its toolchain, a dedicated remote Mac is usually steadier than borrowing a laptop. Compared to ad-hoc hardware, NodeMini Mac Mini cloud rental gives fixed SSH and clear disk tiers so “Gateway + toolchain” can hand off like a node in an estate. Specs and pricing: Mac Mini rental rates; onboarding questions: help center; start the OpenClaw series from the blog category.
Use this article when the process is not listening or you suspect ports, memory, or timeouts; use closed (1000) when a WebSocket was up and policy or session closed it. For capacity and onboarding, see Mac Mini rental rates and the help center.
docker compose logs --tail=120 gateway plus host dmesg / resource snapshot for the same time window; if you suspect volume permissions, add docker compose ps -a exit codes.
Open the blog OpenClaw category for install, Docker, systemd, security, observability, and MCP posts; this article fits the “will not start” phase.