Guide
OpenClaw + Ollama Cloud: Hosted Open Models for Your AI Agent
Run frontier open-weight models like Llama 4, Kimi K2, DeepSeek V3.1, and Qwen 3 inside your OpenClaw agent — without owning a GPU. Same Ollama API surface, hosted in the cloud.
What Is Ollama Cloud?
Ollama Cloud is the hosted version of Ollama — the same OpenAI-compatible API you use locally, but running on datacenter GPUs instead of your laptop. You sign in with your Ollama account, point your client at https://ollama.com, and pull cloud-only models like gpt-oss:120b,kimi-k2:1t, or llama4:scout that would never fit on a consumer card.
For an OpenClaw agent that wants open-weight models without managing infrastructure, Ollama Cloud sits between fully local Ollama (free but GPU-bound) and full third-party API providers (broad model catalog but closed-source defaults).
Ollama vs Ollama Cloud vs OpenRouter
Three ways to wire models into OpenClaw — pick based on where you want the inference to run and who you trust with the prompts.
| Option | Where it runs | Best for | Tradeoff |
|---|---|---|---|
| Local Ollama | Your machine / homelab | Privacy, offline use, no API cost | Needs a real GPU; bigger models cap out fast |
| Ollama Cloud | Ollama-managed GPUs | Frontier open models, no GPU at home | Pay per token, prompts leave your machine |
| OpenRouter | Many provider backends | Wide catalog, both closed + open models | Routing layer between you and the lab |
Models Worth Pulling on Ollama Cloud
The cloud catalog focuses on models too large for most home GPUs. As of April 2026 the standouts for an OpenClaw agent are:
| Model tag | Why it's interesting | Good for |
|---|---|---|
gpt-oss:120b | Frontier-class open reasoning model | Long-context analysis, agent planning |
llama4:scout | Meta Llama 4 with 10M-token context | Document Q&A, codebase reasoning |
kimi-k2:1t | Moonshot Kimi K2 trillion-parameter MoE | Bilingual chat, long-horizon coding |
deepseek-v3.1:671b | DeepSeek V3.1 reasoning model | Math, coding, complex reasoning |
qwen3-coder:480b | Qwen 3 coding-tuned MoE | Repository-scale coding, refactors |
Ollama tags follow family:size. Cloud-only tags (the ones backed by hosted GPUs) are listed at ollama.com/library.
Wire Ollama Cloud Into OpenClaw
Ollama Cloud exposes the same /api/chat and OpenAI-compatible/v1/chat/completions endpoints as local Ollama. In OpenClaw, that means you treat it as an OpenAI-compatible provider with a custom base URL.
1. Get an Ollama API key
Sign in at ollama.com, go to Settings → API Keys, and create a key. Cloud usage is metered per token; the dashboard shows live consumption.
2. Configure the provider in OpenClaw
In your openclaw.json (or via the OpenClaw Launch dashboard), add an OpenAI-compatible provider pointing at Ollama Cloud:
{
"models": {
"providers": {
"ollama-cloud": {
"kind": "openai",
"baseUrl": "https://ollama.com/v1",
"apiKey": "<your-ollama-key>"
}
},
"default": "ollama-cloud/gpt-oss:120b"
}
}Prefix the model id with the provider name (ollama-cloud/<tag>) so OpenClaw routes the request correctly. The same pattern works for any OpenAI-compatible endpoint.
3. Test from your agent
Restart the agent (or hit reload from the dashboard) and send a message. Inside your bot the swap is invisible — Telegram, Discord, the web gateway, and any MCP tool calls all work the same way they do with OpenAI or Anthropic.
When to Pick Ollama Cloud Over a Hyperscaler
- Open-weight is a hard requirement. If your team has policies that prefer non-proprietary weights, Ollama Cloud keeps you on Llama, Qwen, DeepSeek, and Kimi without running your own GPUs.
- You want one provider for local + hosted. Same client, same model names, same prompts. Develop locally on a 14B, deploy against a 480B in production.
- Cost predictability on long-context jobs. Frontier open models on Ollama Cloud often beat closed-model pricing for 100k+ token workloads.
When Local Ollama Still Wins
- The data must never leave your network — local is the only correct answer.
- You already paid for a GPU and want to amortize it.
- The model you need fits on the hardware you have (most 7B–32B models do).
For purely local setups, see the OpenClaw + local Ollama guide.
OpenClaw Launch + Ollama Cloud
OpenClaw Launch deploys a managed OpenClaw agent in under two minutes with the gateway, container, and TLS already wired up. Plug in your Ollama Cloud key under Models → BYOK and your agent runs frontier open-weight models on the next message — no GPU on your end, no Docker on your end.
FAQ
Is Ollama Cloud free?
No. Local Ollama is free; the cloud variant is metered per token because it's renting datacenter GPU time. Pricing is published on the Ollama dashboard and varies by model size.
Can I mix Ollama Cloud and OpenRouter in the same agent?
Yes. OpenClaw treats every provider entry independently. Define both, then pick which model the default agent uses and let skills override per-task as needed.
Does Ollama Cloud support tool calls?
The OpenAI-compatible endpoint supports the standard tools field for any model that the underlying weights know how to use (Llama 4, Qwen 3, DeepSeek V3.1 all do).
What about latency?
Cloud-hosted frontier open models are generally slower per-token than the closed APIs from OpenAI or Anthropic. For latency-sensitive bots, keep a fast closed model as the conversational default and route long-form jobs to Ollama Cloud.