← All Guides

Guide

OpenClaw + DeepSeek V4 on Ollama: Run DeepSeek V4 Locally with OpenClaw

Run a quantized DeepSeek V4 on your own GPU through Ollama and point your OpenClaw agent at it. No OpenRouter, no per-token bill, full data control. Here's the full setup, including the hardware reality check.

Why Run DeepSeek V4 on Ollama?

The hosted route through OpenRouter and OpenClaw Launch is the fastest way to use DeepSeek V4 — one click and you're live. But there are good reasons to run it locally with Ollama instead:

  • Sensitive data that can't leave your network
  • High-volume bots where API spend would dominate the budget
  • Air-gapped or on-prem deployments
  • You already have a GPU box sitting idle

The trade-off is hardware: DeepSeek V4 Flash quantized to 4-bit needs roughly 24 GB+ of VRAM to run comfortably, and Pro is much heavier. If your hardware can't take it, stick with the hosted OpenRouter route.

Hardware Reality Check

VariantRecommended VRAMDiskRealistic on consumer GPU?
V4 Flash Q424–32 GB~25 GBRTX 3090 / 4090 yes; 4080 tight
V4 Flash Q840–48 GB~45 GBA6000 / dual 3090 only
V4 Pro Q4140 GB+~250 GBMulti-A100 / H100 only

Running V4 Pro at home is unrealistic for most setups. V4 Flash Q4 is the practical local target.

Step 1: Install Ollama and Pull DeepSeek V4 Flash

On macOS or Linux:

# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek V4 Flash (Q4 quantization)
ollama pull deepseek-v4-flash

# Verify it loads
ollama run deepseek-v4-flash "hello"

On the first run, Ollama downloads the model (~25 GB for Q4 Flash). The second response should be near-instant on a 3090/4090.

Step 2: Expose Ollama to OpenClaw

By default, Ollama listens on 127.0.0.1:11434. If your OpenClaw container runs on the same host, that's reachable as http://host.docker.internal:11434 from inside the container (on macOS/Windows Docker Desktop) or via the host network on Linux.

To listen on all interfaces (so a remote OpenClaw can reach it), set:

# macOS
launchctl setenv OLLAMA_HOST 0.0.0.0
# Then restart Ollama

# Linux (systemd)
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl restart ollama

Step 3: Point OpenClaw at the Ollama Endpoint

In your OpenClaw openclaw.json, configure Ollama as the provider and set DeepSeek V4 Flash as the default model:

{
  "models": {
    "providers": {
      "ollama": {
        "baseURL": "http://host.docker.internal:11434"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/deepseek-v4-flash"
      }
    }
  }
}

Replace the baseURL with the actual address of the Ollama box if it's on a different machine (e.g. http://192.168.1.50:11434).

Step 4: Restart and Test

Restart your OpenClaw agent (a SIGUSR1 reload is enough for model swaps), then send a test message via Telegram, Discord, or the gateway web chat. The first reply on a fresh load takes a few seconds while Ollama warms up; subsequent replies should be near-instant on a 3090/4090.

Tip: Keep OLLAMA_KEEP_ALIVE=24h in the Ollama environment so the model stays loaded between bot replies. Otherwise Ollama unloads after a few minutes of idle and the next reply triggers a 5–15 second cold start.

Hosted vs Local: Which Should You Pick?

  • Pick hosted (OpenRouter via OpenClaw Launch) if you want zero ops, instant deploy, and access to V4 Pro at $1.74/$3.48 per 1M tokens. See the DeepSeek V4 on OpenClaw Launch guide.
  • Pick Ollama local if you have the hardware (24 GB+ VRAM), want zero per-token cost, or need to keep data on-prem. V4 Flash is realistic locally; V4 Pro is not for most setups.
  • Run both — default to hosted Pro for hard tasks, route everyday chat to local Flash via the /model command.

What's Next?

Skip the Hardware Headache

Run DeepSeek V4 Pro or Flash via OpenClaw Launch with no GPU, no install, AI credits included. Plans from $3/mo.

Deploy Now