Building a Split-Brain OpenClaw Deployment: Gateway + Inference Server Over Tailscale

In my previous post, I compared the Minisforum UM890 Pro against a gaming PC (7800X3D + RTX 5080) for running OpenClaw, and concluded that the best approach is to split the roles: the UM890 Pro as the hardened always-on gateway, and the gaming PC as the GPU-accelerated inference server. This post is the full implementation guide — step-by-step, command-by-command — for anyone who wants to replicate this architecture across two Linux machines connected over Tailscale.

Architecture Overview

The design has two machines with distinct roles, connected over a Tailscale WireGuard mesh:

Machine A — UM890 Pro (Gateway Appliance)

Runs Ubuntu Server 24.04 LTS
Hosts the OpenClaw gateway process inside a hardened Docker container
Handles all messaging channels (WhatsApp, Telegram, Discord, etc.)
Runs browser automation, cron jobs, and skill execution
Points to Machine B for local LLM inference
Falls back to cloud APIs (Claude Sonnet) when Machine B is unavailable
Always-on, low power (~15W idle), near-silent

Machine B — Gaming PC (Inference Server)

Runs Ubuntu Server 24.04 LTS
Hosts Ollama with Nvidia CUDA acceleration (RTX 5080, 16GB VRAM)
Serves Qwen 3.5 27B (Q4_K_M quantization) over the native Ollama API
Listens only on the Tailscale interface — not exposed to LAN or internet
Can be powered down when not needed; the gateway degrades gracefully to cloud APIs

Network Glue — Tailscale

Both machines join the same Tailscale tailnet
All traffic between them is end-to-end encrypted via WireGuard
No port forwarding, no public IP exposure
Tailscale ACLs restrict which devices can reach the Ollama port

┌──────────────────────┐       Tailscale (WireGuard)       ┌──────────────────────┐
│   UM890 Pro          │◄─────────────────────────────────► │   Gaming PC          │
│   (Gateway)          │    100.x.x.1 ◄──► 100.x.x.2      │   (Inference)        │
│                      │                                    │                      │
│  OpenClaw Gateway    │   ollama/qwen3.5:27b @ :11434     │  Ollama + CUDA       │
│  Docker (rootless)   │──────────────────────────────────► │  RTX 5080 (16GB)     │
│  Browser automation  │                                    │  Qwen 3.5 27B        │
│  Messaging channels  │   fallback: Claude Sonnet (cloud)  │                      │
│  Netdata monitoring  │                                    │  Netdata monitoring   │
└──────────────────────┘                                    └──────────────────────┘

Phase 1: Base OS Installation (Both Machines)

Both machines get a clean Ubuntu Server 24.04 LTS minimal install. No desktop environment — this is headless server territory.

1.1 Install Ubuntu Server 24.04 LTS

Flash the Ubuntu Server 24.04 LTS ISO to a USB drive and install on both machines. During installation:

Choose minimal server install (no snaps, no desktop)
Create a non-root user (e.g., prathamesh)
Enable OpenSSH server during install
Use the full 4TB NVMe as a single ext4 partition (or LVM if you prefer flexibility)

1.2 Post-Install Baseline (Both Machines)

After first boot, SSH in and run:

# Update system packages
sudo apt update && sudo apt upgrade -y

# Install essential tools
sudo apt install -y curl wget git htop tmux ufw net-tools

# Set timezone
sudo timedatectl set-timezone America/Los_Angeles

# Enable automatic security updates
sudo apt install -y unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades

1.3 Configure UFW Firewall (Both Machines)

# Default deny inbound, allow outbound
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow ssh

# Enable the firewall
sudo ufw enable
sudo ufw status verbose

We will add Tailscale-specific rules later.

1.4 Install Tailscale (Both Machines)

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh

# Bring Tailscale up and authenticate
sudo tailscale up

# Note the Tailscale IP for each machine
tailscale ip -4

After running tailscale up on both machines and authenticating with the same Tailscale account, note the Tailscale IPs. For the rest of this guide, I’ll use:

UM890 Pro (Gateway): 100.x.x.1
Gaming PC (Inference): 100.x.x.2

Replace these with your actual Tailscale IPs.

1.5 Verify Connectivity

From the UM890 Pro:

ping 100.x.x.2  # Should succeed

From the Gaming PC:

ping 100.x.x.1  # Should succeed

Phase 2: Gaming PC — Inference Server Setup

This is where the RTX 5080 earns its keep.

2.1 Install Nvidia Drivers

# Check that the GPU is visible on the PCI bus
lspci | grep -i nvidia

# Install the recommended driver automatically
sudo ubuntu-drivers autoinstall

# Reboot
sudo reboot

After reboot, verify:

nvidia-smi

You should see the RTX 5080 listed with the driver version and CUDA version. If nvidia-smi fails, troubleshoot before proceeding — Ollama will silently fall back to CPU inference without working drivers, and you’ll get 3 tok/s instead of 40.

2.2 Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Verify it’s running:

systemctl status ollama
ollama --version

2.3 Pull the Qwen 3.5 27B Model

ollama pull qwen3.5:27b

This downloads the Q4_K_M quantized version (~16GB). It will fit entirely in the RTX 5080’s 16GB VRAM. The download may take a while depending on your internet connection.

Verify it’s available:

ollama list

2.4 Test Local Inference

ollama run qwen3.5:27b "Hello, what model are you?"

While this runs, open another terminal and check GPU utilization:

nvidia-smi

You should see VRAM usage spike as the model loads. If VRAM stays at 0 and CPU is pegged, the Nvidia drivers aren’t being detected — go back to step 2.1.

2.5 Bind Ollama to the Tailscale Interface

By default, Ollama only listens on 127.0.0.1:11434. We need it to listen on the Tailscale interface so the UM890 Pro can reach it. The most secure approach is to bind specifically to the Tailscale IP rather than 0.0.0.0.

Create a systemd override:

sudo systemctl edit ollama

This opens an editor. Add the following in the override block:

[Service]
Environment="OLLAMA_HOST=100.x.x.2:11434"

Replace 100.x.x.2 with the gaming PC’s actual Tailscale IP. Save and exit, then reload:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Important: Binding to the Tailscale IP means Ollama only accepts connections from the Tailscale interface. It won’t be reachable from your LAN or the public internet. This is exactly what we want.

Caveat: If the Tailscale IP changes (which is rare but possible), you’ll need to update this. An alternative is to bind to 0.0.0.0 and use UFW to restrict access:

# Alternative: bind to all interfaces but firewall to Tailscale only
# In the systemd override, use:
# Environment="OLLAMA_HOST=0.0.0.0:11434"

# Then restrict with UFW:
sudo ufw allow in on tailscale0 to any port 11434
sudo ufw deny 11434

2.6 Verify Remote Access

From the UM890 Pro, test that Ollama is reachable over Tailscale:

curl http://100.x.x.2:11434/api/tags

You should get a JSON response listing the qwen3.5:27b model. If you get “connection refused,” check that:

Ollama is running (systemctl status ollama)
The OLLAMA_HOST is set correctly (grep -i host /etc/systemd/system/ollama.service.d/override.conf)
Tailscale is connected on both machines (tailscale status)

Phase 3: UM890 Pro — Gateway Appliance Setup

3.1 Create Dedicated OpenClaw User

Following the defense-in-depth model from my earlier post:

# Create a dedicated user with no sudo privileges
sudo adduser --disabled-password --gecos "OpenClaw Agent" openclaw

# Set a strong password (needed for su access if debugging)
sudo passwd openclaw

3.2 Install Docker Engine (Rootless Mode)

We install Docker Engine — not Docker Desktop — and configure rootless mode for enhanced isolation.

# Install prerequisites
sudo apt install -y ca-certificates gnupg uidmap

# Add Docker's official GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Install rootless Docker for the openclaw user
sudo loginctl enable-linger openclaw
sudo -u openclaw -i dockerd-rootless-setuptool.sh install

3.3 Install Node.js 24

OpenClaw requires Node 24 (recommended) or Node 22.14+:

# Switch to the openclaw user
sudo -u openclaw -i

# Install Node.js via nvm (recommended for non-root installs)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install 24
node -v  # Should show v24.x.x

3.4 Install OpenClaw

As the openclaw user:

# Install OpenClaw globally
npm install -g openclaw@latest

# Verify
openclaw --version

3.5 Run OpenClaw Onboarding with Ollama

openclaw onboard

During onboarding:

Select Ollama as the provider
When prompted for the base URL, enter: http://100.x.x.2:11434 (the gaming PC’s Tailscale IP)
Select Local mode (local models only from this provider — we’ll add Claude separately)
The onboarding wizard should discover the qwen3.5:27b model from the remote Ollama instance

3.6 Configure the Hybrid Model Setup

After onboarding completes, edit the OpenClaw configuration to set up the hybrid primary + thinking model:

openclaw config edit

Set the following configuration (in JSONC format):

{
  agents: {
    defaults: {
      model: {
        // Local model for routine tasks (file reads, simple edits, boilerplate)
        primary: "ollama/qwen3.5:27b",
        // Cloud model for complex reasoning (architecture, debugging, multi-file)
        thinking: "anthropic/claude-sonnet-4-6-20260514",
        // Fallback chain if primary is unavailable
        fallbacks: ["anthropic/claude-sonnet-4-6-20260514"]
      }
    }
  },
  models: {
    providers: {
      ollama: {
        baseUrl: "http://100.x.x.2:11434",  // Gaming PC Tailscale IP
        apiKey: "ollama-local",
        api: "ollama"  // Use native Ollama API, NOT /v1
      }
    }
  }
}

Critical detail from the OpenClaw docs: Do not use the /v1 OpenAI-compatible URL. The /v1 path breaks tool calling — models output raw tool JSON as plain text instead of executing tools. Always use the base Ollama URL without a path suffix.

3.7 Add Anthropic API Key

For the Claude Sonnet fallback:

openclaw config set models.providers.anthropic.apiKey "sk-ant-xxxxx"

Use a dedicated, low-spend API key with a hard daily cap (e.g., $10/day). Never reuse your primary work API key for an autonomous agent.

3.8 Configure Gateway Security

# Bind gateway to localhost only — remote access via Tailscale
openclaw config set gateway.bind loopback

# Enable agent sandboxing for tool execution
openclaw config set agents.defaults.sandbox.mode "non-main"
openclaw config set agents.defaults.sandbox.scope "agent"

3.9 Install the Gateway as a Systemd Daemon

openclaw onboard --install-daemon

This creates a systemd user service that starts the OpenClaw gateway automatically on boot.

Verify it’s running:

openclaw gateway status
openclaw doctor

3.10 Set Up Tailscale Remote Access to the Control UI

From any device on your Tailscale network, you can access the OpenClaw Control UI:

# On the UM890 Pro, get the dashboard URL
openclaw dashboard --no-open

Then open http://100.x.x.1:18789/ from any Tailscale-connected device.

Phase 4: Monitoring and Validation

4.1 Install Netdata (Both Machines)

curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh

Netdata provides real-time visibility into CPU, RAM, GPU utilization, network traffic, and disk I/O — invaluable for monitoring both the gateway and inference server.

4.2 Validate the Full Pipeline

From the OpenClaw Control UI or a connected messaging channel, send a test message:

What model are you, and what is your context window?

The response should come from the local Qwen 3.5 27B model. You can verify by checking the Ollama logs on the gaming PC:

journalctl -u ollama -f

You should see the inference request arrive.

4.3 Test Failover

Power off the gaming PC (or stop the Ollama service) and send another message through OpenClaw. The gateway should detect that the Ollama endpoint is unreachable and fall back to Claude Sonnet via the Anthropic API. Check the OpenClaw logs:

journalctl --user -u openclaw -f

You should see the failover from ollama/qwen3.5:27b to anthropic/claude-sonnet-4-6-20260514.

Phase 5: Hardening Checklist

After the basic setup is working, apply these hardening measures:

UM890 Pro (Gateway):

Rootless Docker is enabled for the openclaw user
OpenClaw gateway is bound to loopback (localhost only)
UFW is active with default deny inbound, SSH and Tailscale allowed
Agent sandboxing is set to non-main mode
Anthropic API key has a hard daily spending cap
auditd is installed for forensic logging
openclaw doctor and openclaw security audit --deep pass cleanly
Netdata agent is running and alerting on resource thresholds

Gaming PC (Inference):

Ollama is bound to the Tailscale IP only (not 0.0.0.0)
UFW blocks port 11434 on all interfaces except tailscale0
Nvidia drivers are installed and nvidia-smi reports the RTX 5080
qwen3.5:27b is loaded and generating tokens on-GPU (check VRAM usage)
Netdata agent is running with GPU monitoring

Tailscale:

Both machines are on the same tailnet
Tailscale ACLs restrict which devices can reach port 11434 on the gaming PC
MagicDNS is enabled for hostname-based access (optional but convenient)

Troubleshooting

Ollama returns slow responses (< 5 tok/s on the 27B model): The model is likely running on CPU instead of GPU. Check nvidia-smi — if VRAM usage is near 0 while Ollama is serving a request, the GPU isn’t being used. Reinstall Nvidia drivers and restart Ollama.

OpenClaw can’t reach the Ollama endpoint: Run curl http://100.x.x.2:11434/api/tags from the UM890 Pro. If it fails, check: (1) Tailscale is connected on both machines (tailscale status), (2) Ollama is bound to the correct host (OLLAMA_HOST in the systemd override), (3) UFW isn’t blocking the connection.

Tool calling doesn’t work with the local model: Make sure the OpenClaw Ollama provider uses api: "ollama" (native API), not api: "openai-completions". The /v1 OpenAI-compatible endpoint does not reliably support tool calling.

Failover to Claude doesn’t trigger: Check that the Anthropic API key is set and valid. Run openclaw models list to verify the fallback model is available. Check the OpenClaw failover docs — failover only advances on auth failures, rate limits, and timeouts, not on other error types.

Gaming PC draws too much power at idle: Configure Nvidia power management to reduce idle draw. You can also set up a cron job or Tailscale webhook to wake the machine on demand and suspend it during off-hours.

Cost Analysis

With this architecture running 24/7:

Cost Component	Monthly Estimate
UM890 Pro electricity (24/7, ~15W idle)	~$3.24
Gaming PC electricity (12hr/day, ~120W avg)	~$15.55
Anthropic API (Claude Sonnet, fallback only)	~$5–15
Total	~$24–34/month

Compare this to running OpenClaw purely on cloud APIs at $20–50/day, and the hardware setup pays for itself within weeks.

Final Thoughts

This split-role deployment isn’t just about optimizing for one specific setup. The pattern generalizes: separate your always-on orchestration from your compute-heavy inference, and connect them over a secure overlay network. The orchestration machine can be any low-power Linux box. The inference machine can be any GPU-equipped server — or even a cloud GPU instance that you spin up on demand.

The key insight from building this: OpenClaw’s model failover system means the gateway doesn’t care if the inference server is a local GPU, a cloud API, or some combination. It tries the primary model, and if that fails, it moves down the fallback chain. The gaming PC can be powered on for focused work sessions and off overnight, and the agent keeps working seamlessly through Claude during the gaps.

Both machines are expendable in isolation. Together, they’re more than the sum of their parts.