Guide
OpenClaw + Ollama: Run Local AI Models
Use Ollama to run AI models locally and connect them to OpenClaw — complete privacy, zero API costs, and offline capability.
What Is Ollama?
Ollama is an open-source tool that lets you run large language models (LLMs) locally on your own computer. Instead of sending your prompts to OpenAI, Anthropic, or Google, everything stays on your machine. Ollama supports popular open-source models like Llama, Mistral, Qwen, DeepSeek, and many more.
Why Use Ollama with OpenClaw?
Connecting Ollama to OpenClaw gives you a fully local AI agent with real capabilities:
- Complete privacy — Your conversations never leave your machine. No data is sent to any cloud API.
- Zero API costs — Local models are free to run. No per-token billing, no usage limits, no surprise charges.
- Offline capability — Once a model is downloaded, it works without an internet connection.
- Full agent features — You still get all of OpenClaw's features: Telegram/Discord integration, 5,700+ ClawHub skills, web UI, and session management.
Compatible Local Models
Ollama supports hundreds of models. Here are the most popular ones for use with OpenClaw:
| Model | Parameters | VRAM Needed | Best For |
|---|---|---|---|
| Llama 3.3 70B | 70B | 40 GB | Best open-source all-rounder |
| Llama 3.2 8B | 8B | 5 GB | Fast and lightweight |
| Mistral Small 3.1 | 24B | 14 GB | Strong reasoning at low cost |
| Qwen 3 32B | 32B | 20 GB | Excellent multilingual support |
| DeepSeek R1 14B | 14B | 9 GB | Strong coding and math |
| Phi-4 14B | 14B | 9 GB | Compact Microsoft model |
How to Set Up Ollama
- Install Ollama — Download from ollama.com/download. Available for macOS, Linux, and Windows.
- Pull a model — Open your terminal and run:
ollama pull llama3.3
This downloads the Llama 3.3 model (~40 GB for 70B). For a lighter option, tryollama pull llama3.2(8B, ~5 GB). - Start the Ollama server — Run
ollama serve(it may already be running as a background service). The server listens onhttp://localhost:11434by default. - Verify it works — Run
ollama listto see your downloaded models, orollama run llama3.3to chat directly in the terminal.
How to Connect Ollama to OpenClaw
To use Ollama as the AI backend for a self-hosted OpenClaw instance, configure the model provider in your openclaw.json config file:
1. Set Ollama as a model provider
In your OpenClaw config, add an ollama entry under models.providers with the base URL of your Ollama server:
"models": {
"providers": {
"ollama": {
"apiBase": "http://localhost:11434/v1"
}
}
}2. Set the default model
Point the agent's primary model to your Ollama model. The model ID must be prefixed with the provider name:
"agents": {
"defaults": {
"model": {
"primary": "ollama/llama3.3"
}
}
}3. Restart OpenClaw
After updating the config, restart your OpenClaw container. The agent will now route all requests through your local Ollama server.
http://host.docker.internal:11434/v1 instead of localhost.Local (Ollama) vs. Cloud Models
Both approaches have trade-offs. Here's how they compare:
| Ollama (Local) | Cloud (OpenClaw Launch) | |
|---|---|---|
| Privacy | Full — data stays on device | Encrypted at rest, routed via API |
| Cost | Free (electricity + hardware) | From $3/mo + per-token API costs |
| Model quality | Good (open-source models) | Best (Claude Opus, GPT-5.2, Gemini) |
| Speed | Depends on GPU hardware | Fast (cloud inference) |
| Setup | Install Ollama + self-host OpenClaw | Visual editor, one-click deploy |
| Offline | Yes | No — requires internet |
| Hardware needed | GPU with 5–40 GB VRAM | None — fully managed |
Hardware Requirements
Local model performance depends on your GPU. Here are rough guidelines:
- 8B models (Llama 3.2, Phi-4) — Need ~5 GB VRAM. Run well on most modern GPUs or Apple M-series chips with 16 GB+ unified memory.
- 14–32B models (Mistral Small, Qwen 3) — Need 9–20 GB VRAM. Require a dedicated GPU (RTX 3090/4090) or M-series Mac with 32 GB+.
- 70B models (Llama 3.3) — Need ~40 GB VRAM. Require high-end hardware (dual GPUs, A100, or M-series with 64 GB+).
If you don't have a capable GPU, cloud models via OpenClaw Launch are the faster path — no hardware investment needed.
When to Use Each Approach
Choose Ollama + self-hosted OpenClaw if you prioritize data privacy, want zero ongoing costs, have a capable GPU, and are comfortable with Docker and command-line setup.
Choose OpenClaw Launch (cloud) if you want the best model quality (Claude Opus, GPT-5.2), don't want to manage servers, or need your bot running 24/7 without dedicated hardware. Deploy in 30 seconds with zero infrastructure setup.