Guide
OpenClaw + DeepSeek V4 on Ollama: Run DeepSeek V4 Locally with OpenClaw
Run a quantized DeepSeek V4 on your own GPU through Ollama and point your OpenClaw agent at it. No OpenRouter, no per-token bill, full data control. Here's the full setup, including the hardware reality check.
Why Run DeepSeek V4 on Ollama?
The hosted route through OpenRouter and OpenClaw Launch is the fastest way to use DeepSeek V4 — one click and you're live. But there are good reasons to run it locally with Ollama instead:
- Sensitive data that can't leave your network
- High-volume bots where API spend would dominate the budget
- Air-gapped or on-prem deployments
- You already have a GPU box sitting idle
The trade-off is hardware: DeepSeek V4 Flash quantized to 4-bit needs roughly 24 GB+ of VRAM to run comfortably, and Pro is much heavier. If your hardware can't take it, stick with the hosted OpenRouter route.
Hardware Reality Check
| Variant | Recommended VRAM | Disk | Realistic on consumer GPU? |
|---|---|---|---|
| V4 Flash Q4 | 24–32 GB | ~25 GB | RTX 3090 / 4090 yes; 4080 tight |
| V4 Flash Q8 | 40–48 GB | ~45 GB | A6000 / dual 3090 only |
| V4 Pro Q4 | 140 GB+ | ~250 GB | Multi-A100 / H100 only |
Running V4 Pro at home is unrealistic for most setups. V4 Flash Q4 is the practical local target.
Step 1: Install Ollama and Pull DeepSeek V4 Flash
On macOS or Linux:
# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull DeepSeek V4 Flash (Q4 quantization)
ollama pull deepseek-v4-flash
# Verify it loads
ollama run deepseek-v4-flash "hello"On the first run, Ollama downloads the model (~25 GB for Q4 Flash). The second response should be near-instant on a 3090/4090.
Step 2: Expose Ollama to OpenClaw
By default, Ollama listens on 127.0.0.1:11434. If your OpenClaw container runs on the same host, that's reachable as http://host.docker.internal:11434 from inside the container (on macOS/Windows Docker Desktop) or via the host network on Linux.
To listen on all interfaces (so a remote OpenClaw can reach it), set:
# macOS
launchctl setenv OLLAMA_HOST 0.0.0.0
# Then restart Ollama
# Linux (systemd)
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl restart ollamaStep 3: Point OpenClaw at the Ollama Endpoint
In your OpenClaw openclaw.json, configure Ollama as the provider and set DeepSeek V4 Flash as the default model:
{
"models": {
"providers": {
"ollama": {
"baseURL": "http://host.docker.internal:11434"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/deepseek-v4-flash"
}
}
}
}Replace the baseURL with the actual address of the Ollama box if it's on a different machine (e.g. http://192.168.1.50:11434).
Step 4: Restart and Test
Restart your OpenClaw agent (a SIGUSR1 reload is enough for model swaps), then send a test message via Telegram, Discord, or the gateway web chat. The first reply on a fresh load takes a few seconds while Ollama warms up; subsequent replies should be near-instant on a 3090/4090.
OLLAMA_KEEP_ALIVE=24h in the Ollama environment so the model stays loaded between bot replies. Otherwise Ollama unloads after a few minutes of idle and the next reply triggers a 5–15 second cold start.Hosted vs Local: Which Should You Pick?
- Pick hosted (OpenRouter via OpenClaw Launch) if you want zero ops, instant deploy, and access to V4 Pro at $1.74/$3.48 per 1M tokens. See the DeepSeek V4 on OpenClaw Launch guide.
- Pick Ollama local if you have the hardware (24 GB+ VRAM), want zero per-token cost, or need to keep data on-prem. V4 Flash is realistic locally; V4 Pro is not for most setups.
- Run both — default to hosted Pro for hard tasks, route everyday chat to local Flash via the
/modelcommand.
What's Next?
- Hosted DeepSeek V4 setup — The one-click route via OpenRouter
- OpenClaw + Ollama — General Ollama setup guide for any model
- Ollama Cloud — Run hosted Ollama if you don't have a GPU
- Compare all models — See where DeepSeek V4 sits vs Claude, GPT, Gemini