Does Ollama Cloud support tool calls in OpenClaw?

Yes. The OpenAI-compatible endpoint supports the standard tools field for any model whose weights handle tool use, including Llama 4, Qwen 3, and DeepSeek V3.1.

How does Ollama Cloud differ from OpenRouter?

Ollama Cloud is a single-vendor hosted Ollama. OpenRouter is a routing layer over many providers. Pick Ollama Cloud for open-weight focus and consistent UX with local Ollama; pick OpenRouter for the broadest catalog including closed models.

← Home

Guide

OpenClaw + Ollama Cloud: Hosted Open Models for Your AI Agent

Q: Is Ollama Cloud free?

No. Local Ollama is free; Ollama Cloud is metered per token because it runs on datacenter GPUs.

Run frontier open-weight models like Llama 4, Kimi K2, DeepSeek V3.1, and Qwen 3 inside your OpenClaw agent — without owning a GPU. Same Ollama API surface, hosted in the cloud.

What Is Ollama Cloud?

Ollama Cloud is the hosted version of Ollama — the same OpenAI-compatible API you use locally, but running on datacenter GPUs instead of your laptop. You sign in with your Ollama account, point your client at https://ollama.com, and pull cloud-only models like gpt-oss:120b,kimi-k2:1t, or llama4:scout that would never fit on a consumer card.

For an OpenClaw agent that wants open-weight models without managing infrastructure, Ollama Cloud sits between fully local Ollama (free but GPU-bound) and full third-party API providers (broad model catalog but closed-source defaults).

Ollama vs Ollama Cloud vs OpenRouter

Three ways to wire models into OpenClaw — pick based on where you want the inference to run and who you trust with the prompts.

Option	Where it runs	Best for	Tradeoff
Local Ollama	Your machine / homelab	Privacy, offline use, no API cost	Needs a real GPU; bigger models cap out fast
Ollama Cloud	Ollama-managed GPUs	Frontier open models, no GPU at home	Pay per token, prompts leave your machine
OpenRouter	Many provider backends	Wide catalog, both closed + open models	Routing layer between you and the lab

Models Worth Pulling on Ollama Cloud

The cloud catalog focuses on models too large for most home GPUs. As of April 2026 the standouts for an OpenClaw agent are:

Model tag	Why it's interesting	Good for
`gpt-oss:120b`	Frontier-class open reasoning model	Long-context analysis, agent planning
`llama4:scout`	Meta Llama 4 with 10M-token context	Document Q&A, codebase reasoning
`kimi-k2:1t`	Moonshot Kimi K2 trillion-parameter MoE	Bilingual chat, long-horizon coding
`deepseek-v3.1:671b`	DeepSeek V3.1 reasoning model	Math, coding, complex reasoning
`qwen3-coder:480b`	Qwen 3 coding-tuned MoE	Repository-scale coding, refactors

Ollama tags follow family:size. Cloud-only tags (the ones backed by hosted GPUs) are listed at ollama.com/library.

Wire Ollama Cloud Into OpenClaw

Ollama Cloud exposes the same /api/chat and OpenAI-compatible/v1/chat/completions endpoints as local Ollama. In OpenClaw, that means you treat it as an OpenAI-compatible provider with a custom base URL.

1. Get an Ollama API key

Sign in at ollama.com, go to Settings → API Keys, and create a key. Cloud usage is metered per token; the dashboard shows live consumption.

2. Configure the provider in OpenClaw

In your openclaw.json (or via the OpenClaw Launch dashboard), add an OpenAI-compatible provider pointing at Ollama Cloud:

{
  "models": {
    "providers": {
      "ollama-cloud": {
        "kind": "openai",
        "baseUrl": "https://ollama.com/v1",
        "apiKey": "<your-ollama-key>"
      }
    },
    "default": "ollama-cloud/gpt-oss:120b"
  }
}

Prefix the model id with the provider name (ollama-cloud/<tag>) so OpenClaw routes the request correctly. The same pattern works for any OpenAI-compatible endpoint.

3. Test from your agent

Restart the agent (or hit reload from the dashboard) and send a message. Inside your bot the swap is invisible — Telegram, Discord, the web gateway, and any MCP tool calls all work the same way they do with OpenAI or Anthropic.

When to Pick Ollama Cloud Over a Hyperscaler

Open-weight is a hard requirement. If your team has policies that prefer non-proprietary weights, Ollama Cloud keeps you on Llama, Qwen, DeepSeek, and Kimi without running your own GPUs.
You want one provider for local + hosted. Same client, same model names, same prompts. Develop locally on a 14B, deploy against a 480B in production.
Cost predictability on long-context jobs. Frontier open models on Ollama Cloud often beat closed-model pricing for 100k+ token workloads.

When Local Ollama Still Wins

The data must never leave your network — local is the only correct answer.
You already paid for a GPU and want to amortize it.
The model you need fits on the hardware you have (most 7B–32B models do).

For purely local setups, see the OpenClaw + local Ollama guide.

OpenClaw Launch + Ollama Cloud

OpenClaw Launch deploys a managed OpenClaw agent in under two minutes with the gateway, container, and TLS already wired up. Plug in your Ollama Cloud key under Models → BYOK and your agent runs frontier open-weight models on the next message — no GPU on your end, no Docker on your end.

FAQ

Is Ollama Cloud free?

No. Local Ollama is free; the cloud variant is metered per token because it's renting datacenter GPU time. Pricing is published on the Ollama dashboard and varies by model size.

Can I mix Ollama Cloud and OpenRouter in the same agent?

Yes. OpenClaw treats every provider entry independently. Define both, then pick which model the default agent uses and let skills override per-task as needed.

Does Ollama Cloud support tool calls?

The OpenAI-compatible endpoint supports the standard tools field for any model that the underlying weights know how to use (Llama 4, Qwen 3, DeepSeek V3.1 all do).

What about latency?

Cloud-hosted frontier open models are generally slower per-token than the closed APIs from OpenAI or Anthropic. For latency-sensitive bots, keep a fast closed model as the conversational default and route long-form jobs to Ollama Cloud.