Building a Split-Brain OpenClaw Deployment: Gateway + Inference Server Over Tailscale
In my previous post, I compared the Minisforum UM890 Pro against a gaming PC (7800X3D + RTX 5080) for running OpenClaw, and concluded that the best approach is to split the roles: the UM890 Pro as the hardened always-on gateway, and the gaming PC as the GPU-accelerated inference server. This post is the full implementation guide — step-by-step, command-by-command — for anyone who wants to replicate this architecture across two Linux machines connected over Tailscale.
Architecture Overview
The design has two machines with distinct roles, connected over a Tailscale WireGuard mesh:
Machine A — UM890 Pro (Gateway Appliance)
- Runs Ubuntu Server 24.04 LTS
- Hosts the OpenClaw gateway process inside a hardened Docker container
- Handles all messaging channels (WhatsApp, Telegram, Discord, etc.)
- Runs browser automation, cron jobs, and skill execution
- Points to Machine B for local LLM inference
- Falls back to cloud APIs (Claude Sonnet) when Machine B is unavailable
- Always-on, low power (~15W idle), near-silent
Machine B — Gaming PC (Inference Server)
- Runs Ubuntu Server 24.04 LTS
- Hosts Ollama with Nvidia CUDA acceleration (RTX 5080, 16GB VRAM)
- Serves Qwen 3.5 27B (Q4_K_M quantization) over the native Ollama API
- Listens only on the Tailscale interface — not exposed to LAN or internet
- Can be powered down when not needed; the gateway degrades gracefully to cloud APIs
Network Glue — Tailscale
- Both machines join the same Tailscale tailnet
- All traffic between them is end-to-end encrypted via WireGuard
- No port forwarding, no public IP exposure
- Tailscale ACLs restrict which devices can reach the Ollama port
┌──────────────────────┐ Tailscale (WireGuard) ┌──────────────────────┐
│ UM890 Pro │◄─────────────────────────────────► │ Gaming PC │
│ (Gateway) │ 100.x.x.1 ◄──► 100.x.x.2 │ (Inference) │
│ │ │ │
│ OpenClaw Gateway │ ollama/qwen3.5:27b @ :11434 │ Ollama + CUDA │
│ Docker (rootless) │──────────────────────────────────► │ RTX 5080 (16GB) │
│ Browser automation │ │ Qwen 3.5 27B │
│ Messaging channels │ fallback: Claude Sonnet (cloud) │ │
│ Netdata monitoring │ │ Netdata monitoring │
└──────────────────────┘ └──────────────────────┘
Phase 1: Base OS Installation (Both Machines)
Both machines get a clean Ubuntu Server 24.04 LTS minimal install. No desktop environment — this is headless server territory.
1.1 Install Ubuntu Server 24.04 LTS
Flash the Ubuntu Server 24.04 LTS ISO to a USB drive and install on both machines. During installation:
- Choose minimal server install (no snaps, no desktop)
- Create a non-root user (e.g.,
prathamesh) - Enable OpenSSH server during install
- Use the full 4TB NVMe as a single ext4 partition (or LVM if you prefer flexibility)
1.2 Post-Install Baseline (Both Machines)
After first boot, SSH in and run:
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install essential tools
sudo apt install -y curl wget git htop tmux ufw net-tools
# Set timezone
sudo timedatectl set-timezone America/Los_Angeles
# Enable automatic security updates
sudo apt install -y unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades
1.3 Configure UFW Firewall (Both Machines)
# Default deny inbound, allow outbound
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow ssh
# Enable the firewall
sudo ufw enable
sudo ufw status verbose
We will add Tailscale-specific rules later.
1.4 Install Tailscale (Both Machines)
# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
# Bring Tailscale up and authenticate
sudo tailscale up
# Note the Tailscale IP for each machine
tailscale ip -4
After running tailscale up on both machines and authenticating with the same Tailscale account, note the Tailscale IPs. For the rest of this guide, I’ll use:
- UM890 Pro (Gateway):
100.x.x.1 - Gaming PC (Inference):
100.x.x.2
Replace these with your actual Tailscale IPs.
1.5 Verify Connectivity
From the UM890 Pro:
ping 100.x.x.2 # Should succeed
From the Gaming PC:
ping 100.x.x.1 # Should succeed
Phase 2: Gaming PC — Inference Server Setup
This is where the RTX 5080 earns its keep.
2.1 Install Nvidia Drivers
# Check that the GPU is visible on the PCI bus
lspci | grep -i nvidia
# Install the recommended driver automatically
sudo ubuntu-drivers autoinstall
# Reboot
sudo reboot
After reboot, verify:
nvidia-smi
You should see the RTX 5080 listed with the driver version and CUDA version. If nvidia-smi fails, troubleshoot before proceeding — Ollama will silently fall back to CPU inference without working drivers, and you’ll get 3 tok/s instead of 40.
2.2 Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Verify it’s running:
systemctl status ollama
ollama --version
2.3 Pull the Qwen 3.5 27B Model
ollama pull qwen3.5:27b
This downloads the Q4_K_M quantized version (~16GB). It will fit entirely in the RTX 5080’s 16GB VRAM. The download may take a while depending on your internet connection.
Verify it’s available:
ollama list
2.4 Test Local Inference
ollama run qwen3.5:27b "Hello, what model are you?"
While this runs, open another terminal and check GPU utilization:
nvidia-smi
You should see VRAM usage spike as the model loads. If VRAM stays at 0 and CPU is pegged, the Nvidia drivers aren’t being detected — go back to step 2.1.
2.5 Bind Ollama to the Tailscale Interface
By default, Ollama only listens on 127.0.0.1:11434. We need it to listen on the Tailscale interface so the UM890 Pro can reach it. The most secure approach is to bind specifically to the Tailscale IP rather than 0.0.0.0.
Create a systemd override:
sudo systemctl edit ollama
This opens an editor. Add the following in the override block:
[Service]
Environment="OLLAMA_HOST=100.x.x.2:11434"
Replace 100.x.x.2 with the gaming PC’s actual Tailscale IP. Save and exit, then reload:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Important: Binding to the Tailscale IP means Ollama only accepts connections from the Tailscale interface. It won’t be reachable from your LAN or the public internet. This is exactly what we want.
Caveat: If the Tailscale IP changes (which is rare but possible), you’ll need to update this. An alternative is to bind to 0.0.0.0 and use UFW to restrict access:
# Alternative: bind to all interfaces but firewall to Tailscale only
# In the systemd override, use:
# Environment="OLLAMA_HOST=0.0.0.0:11434"
# Then restrict with UFW:
sudo ufw allow in on tailscale0 to any port 11434
sudo ufw deny 11434
2.6 Verify Remote Access
From the UM890 Pro, test that Ollama is reachable over Tailscale:
curl http://100.x.x.2:11434/api/tags
You should get a JSON response listing the qwen3.5:27b model. If you get “connection refused,” check that:
- Ollama is running (
systemctl status ollama) - The
OLLAMA_HOSTis set correctly (grep -i host /etc/systemd/system/ollama.service.d/override.conf) - Tailscale is connected on both machines (
tailscale status)
Phase 3: UM890 Pro — Gateway Appliance Setup
3.1 Create Dedicated OpenClaw User
Following the defense-in-depth model from my earlier post:
# Create a dedicated user with no sudo privileges
sudo adduser --disabled-password --gecos "OpenClaw Agent" openclaw
# Set a strong password (needed for su access if debugging)
sudo passwd openclaw
3.2 Install Docker Engine (Rootless Mode)
We install Docker Engine — not Docker Desktop — and configure rootless mode for enhanced isolation.
# Install prerequisites
sudo apt install -y ca-certificates gnupg uidmap
# Add Docker's official GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Install rootless Docker for the openclaw user
sudo loginctl enable-linger openclaw
sudo -u openclaw -i dockerd-rootless-setuptool.sh install
3.3 Install Node.js 24
OpenClaw requires Node 24 (recommended) or Node 22.14+:
# Switch to the openclaw user
sudo -u openclaw -i
# Install Node.js via nvm (recommended for non-root installs)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install 24
node -v # Should show v24.x.x
3.4 Install OpenClaw
As the openclaw user:
# Install OpenClaw globally
npm install -g openclaw@latest
# Verify
openclaw --version
3.5 Run OpenClaw Onboarding with Ollama
openclaw onboard
During onboarding:
- Select Ollama as the provider
- When prompted for the base URL, enter:
http://100.x.x.2:11434(the gaming PC’s Tailscale IP) - Select Local mode (local models only from this provider — we’ll add Claude separately)
- The onboarding wizard should discover the
qwen3.5:27bmodel from the remote Ollama instance
3.6 Configure the Hybrid Model Setup
After onboarding completes, edit the OpenClaw configuration to set up the hybrid primary + thinking model:
openclaw config edit
Set the following configuration (in JSONC format):
{
agents: {
defaults: {
model: {
// Local model for routine tasks (file reads, simple edits, boilerplate)
primary: "ollama/qwen3.5:27b",
// Cloud model for complex reasoning (architecture, debugging, multi-file)
thinking: "anthropic/claude-sonnet-4-6-20260514",
// Fallback chain if primary is unavailable
fallbacks: ["anthropic/claude-sonnet-4-6-20260514"]
}
}
},
models: {
providers: {
ollama: {
baseUrl: "http://100.x.x.2:11434", // Gaming PC Tailscale IP
apiKey: "ollama-local",
api: "ollama" // Use native Ollama API, NOT /v1
}
}
}
}
Critical detail from the OpenClaw docs: Do not use the /v1 OpenAI-compatible URL. The /v1 path breaks tool calling — models output raw tool JSON as plain text instead of executing tools. Always use the base Ollama URL without a path suffix.
3.7 Add Anthropic API Key
For the Claude Sonnet fallback:
openclaw config set models.providers.anthropic.apiKey "sk-ant-xxxxx"
Use a dedicated, low-spend API key with a hard daily cap (e.g., $10/day). Never reuse your primary work API key for an autonomous agent.
3.8 Configure Gateway Security
# Bind gateway to localhost only — remote access via Tailscale
openclaw config set gateway.bind loopback
# Enable agent sandboxing for tool execution
openclaw config set agents.defaults.sandbox.mode "non-main"
openclaw config set agents.defaults.sandbox.scope "agent"
3.9 Install the Gateway as a Systemd Daemon
openclaw onboard --install-daemon
This creates a systemd user service that starts the OpenClaw gateway automatically on boot.
Verify it’s running:
openclaw gateway status
openclaw doctor
3.10 Set Up Tailscale Remote Access to the Control UI
From any device on your Tailscale network, you can access the OpenClaw Control UI:
# On the UM890 Pro, get the dashboard URL
openclaw dashboard --no-open
Then open http://100.x.x.1:18789/ from any Tailscale-connected device.
Phase 4: Monitoring and Validation
4.1 Install Netdata (Both Machines)
curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh
Netdata provides real-time visibility into CPU, RAM, GPU utilization, network traffic, and disk I/O — invaluable for monitoring both the gateway and inference server.
4.2 Validate the Full Pipeline
From the OpenClaw Control UI or a connected messaging channel, send a test message:
What model are you, and what is your context window?
The response should come from the local Qwen 3.5 27B model. You can verify by checking the Ollama logs on the gaming PC:
journalctl -u ollama -f
You should see the inference request arrive.
4.3 Test Failover
Power off the gaming PC (or stop the Ollama service) and send another message through OpenClaw. The gateway should detect that the Ollama endpoint is unreachable and fall back to Claude Sonnet via the Anthropic API. Check the OpenClaw logs:
journalctl --user -u openclaw -f
You should see the failover from ollama/qwen3.5:27b to anthropic/claude-sonnet-4-6-20260514.
Phase 5: Hardening Checklist
After the basic setup is working, apply these hardening measures:
UM890 Pro (Gateway):
- Rootless Docker is enabled for the
openclawuser - OpenClaw gateway is bound to
loopback(localhost only) - UFW is active with default deny inbound, SSH and Tailscale allowed
- Agent sandboxing is set to
non-mainmode - Anthropic API key has a hard daily spending cap
auditdis installed for forensic loggingopenclaw doctorandopenclaw security audit --deeppass cleanly- Netdata agent is running and alerting on resource thresholds
Gaming PC (Inference):
- Ollama is bound to the Tailscale IP only (not
0.0.0.0) - UFW blocks port 11434 on all interfaces except
tailscale0 - Nvidia drivers are installed and
nvidia-smireports the RTX 5080 qwen3.5:27bis loaded and generating tokens on-GPU (check VRAM usage)- Netdata agent is running with GPU monitoring
Tailscale:
- Both machines are on the same tailnet
- Tailscale ACLs restrict which devices can reach port 11434 on the gaming PC
- MagicDNS is enabled for hostname-based access (optional but convenient)
Troubleshooting
Ollama returns slow responses (< 5 tok/s on the 27B model):
The model is likely running on CPU instead of GPU. Check nvidia-smi — if VRAM usage is near 0 while Ollama is serving a request, the GPU isn’t being used. Reinstall Nvidia drivers and restart Ollama.
OpenClaw can’t reach the Ollama endpoint:
Run curl http://100.x.x.2:11434/api/tags from the UM890 Pro. If it fails, check: (1) Tailscale is connected on both machines (tailscale status), (2) Ollama is bound to the correct host (OLLAMA_HOST in the systemd override), (3) UFW isn’t blocking the connection.
Tool calling doesn’t work with the local model:
Make sure the OpenClaw Ollama provider uses api: "ollama" (native API), not api: "openai-completions". The /v1 OpenAI-compatible endpoint does not reliably support tool calling.
Failover to Claude doesn’t trigger:
Check that the Anthropic API key is set and valid. Run openclaw models list to verify the fallback model is available. Check the OpenClaw failover docs — failover only advances on auth failures, rate limits, and timeouts, not on other error types.
Gaming PC draws too much power at idle: Configure Nvidia power management to reduce idle draw. You can also set up a cron job or Tailscale webhook to wake the machine on demand and suspend it during off-hours.
Cost Analysis
With this architecture running 24/7:
| Cost Component | Monthly Estimate |
|---|---|
| UM890 Pro electricity (24/7, ~15W idle) | ~$3.24 |
| Gaming PC electricity (12hr/day, ~120W avg) | ~$15.55 |
| Anthropic API (Claude Sonnet, fallback only) | ~$5–15 |
| Total | ~$24–34/month |
Compare this to running OpenClaw purely on cloud APIs at $20–50/day, and the hardware setup pays for itself within weeks.
Final Thoughts
This split-role deployment isn’t just about optimizing for one specific setup. The pattern generalizes: separate your always-on orchestration from your compute-heavy inference, and connect them over a secure overlay network. The orchestration machine can be any low-power Linux box. The inference machine can be any GPU-equipped server — or even a cloud GPU instance that you spin up on demand.
The key insight from building this: OpenClaw’s model failover system means the gateway doesn’t care if the inference server is a local GPU, a cloud API, or some combination. It tries the primary model, and if that fails, it moves down the fallback chain. The gaming PC can be powered on for focused work sessions and off overnight, and the agent keeps working seamlessly through Claude during the gaps.
Both machines are expendable in isolation. Together, they’re more than the sum of their parts.