← Home

Guide

OpenClaw + Ollama: Run Local AI Models

Use Ollama to run AI models locally and connect them to OpenClaw — complete privacy, zero API costs, and offline capability.

What Is Ollama?

Ollama is an open-source tool that lets you run large language models (LLMs) locally on your own computer. Instead of sending your prompts to OpenAI, Anthropic, or Google, everything stays on your machine. Ollama supports popular open-source models like Llama, Mistral, Qwen, DeepSeek, and many more.

Why Use Ollama with OpenClaw?

Connecting Ollama to OpenClaw gives you a fully local AI agent with real capabilities:

  • Complete privacy — Your conversations never leave your machine. No data is sent to any cloud API.
  • Zero API costs — Local models are free to run. No per-token billing, no usage limits, no surprise charges.
  • Offline capability — Once a model is downloaded, it works without an internet connection.
  • Full agent features — You still get all of OpenClaw's features: Telegram/Discord integration, 5,700+ ClawHub skills, web UI, and session management.

Compatible Local Models

Ollama supports hundreds of models. Here are the most popular ones for use with OpenClaw:

ModelParametersVRAM NeededBest For
Llama 3.3 70B70B40 GBBest open-source all-rounder
Llama 3.2 8B8B5 GBFast and lightweight
Mistral Small 3.124B14 GBStrong reasoning at low cost
Qwen 3 32B32B20 GBExcellent multilingual support
DeepSeek R1 14B14B9 GBStrong coding and math
Phi-4 14B14B9 GBCompact Microsoft model

How to Set Up Ollama

  1. Install Ollama — Download from ollama.com/download. Available for macOS, Linux, and Windows.
  2. Pull a model — Open your terminal and run:
    ollama pull llama3.3
    This downloads the Llama 3.3 model (~40 GB for 70B). For a lighter option, try ollama pull llama3.2 (8B, ~5 GB).
  3. Start the Ollama server — Run ollama serve (it may already be running as a background service). The server listens on http://localhost:11434 by default.
  4. Verify it works — Run ollama list to see your downloaded models, or ollama run llama3.3 to chat directly in the terminal.

How to Connect Ollama to OpenClaw

To use Ollama as the AI backend for a self-hosted OpenClaw instance, configure the model provider in your openclaw.json config file:

1. Set Ollama as a model provider

In your OpenClaw config, add an ollama entry under models.providers with the base URL of your Ollama server:

"models": {
  "providers": {
    "ollama": {
      "apiBase": "http://localhost:11434/v1"
    }
  }
}

2. Set the default model

Point the agent's primary model to your Ollama model. The model ID must be prefixed with the provider name:

"agents": {
  "defaults": {
    "model": {
      "primary": "ollama/llama3.3"
    }
  }
}

3. Restart OpenClaw

After updating the config, restart your OpenClaw container. The agent will now route all requests through your local Ollama server.

Network note: If OpenClaw runs in Docker but Ollama runs on the host, use http://host.docker.internal:11434/v1 instead of localhost.

Local (Ollama) vs. Cloud Models

Both approaches have trade-offs. Here's how they compare:

Ollama (Local)Cloud (OpenClaw Launch)
PrivacyFull — data stays on deviceEncrypted at rest, routed via API
CostFree (electricity + hardware)From $3/mo + per-token API costs
Model qualityGood (open-source models)Best (Claude Opus, GPT-5.2, Gemini)
SpeedDepends on GPU hardwareFast (cloud inference)
SetupInstall Ollama + self-host OpenClawVisual editor, one-click deploy
OfflineYesNo — requires internet
Hardware neededGPU with 5–40 GB VRAMNone — fully managed

Hardware Requirements

Local model performance depends on your GPU. Here are rough guidelines:

  • 8B models (Llama 3.2, Phi-4) — Need ~5 GB VRAM. Run well on most modern GPUs or Apple M-series chips with 16 GB+ unified memory.
  • 14–32B models (Mistral Small, Qwen 3) — Need 9–20 GB VRAM. Require a dedicated GPU (RTX 3090/4090) or M-series Mac with 32 GB+.
  • 70B models (Llama 3.3) — Need ~40 GB VRAM. Require high-end hardware (dual GPUs, A100, or M-series with 64 GB+).

If you don't have a capable GPU, cloud models via OpenClaw Launch are the faster path — no hardware investment needed.

When to Use Each Approach

Choose Ollama + self-hosted OpenClaw if you prioritize data privacy, want zero ongoing costs, have a capable GPU, and are comfortable with Docker and command-line setup.

Choose OpenClaw Launch (cloud) if you want the best model quality (Claude Opus, GPT-5.2), don't want to manage servers, or need your bot running 24/7 without dedicated hardware. Deploy in 30 seconds with zero infrastructure setup.

Try Cloud-Powered OpenClaw

No GPU? No problem. Deploy your AI agent in 30 seconds with leading cloud models.

Configure & Deploy