Guide
How to Use NVIDIA Nemotron 3 Ultra with OpenClaw
NVIDIA Nemotron 3 Ultra is a massive mixture-of-experts model built for long-horizon agentic tasks — tool calls, multi-step reasoning, and huge context windows. This guide shows how to wire it into OpenClaw via OpenRouter or a local Ollama server.
What Is NVIDIA Nemotron 3 Ultra?
Nemotron 3 Ultra is an open model released by NVIDIA and optimized for agentic, long-running workflows. A few numbers worth knowing:
- Architecture: Mixture-of-Experts (MoE) — roughly 550B total parameters, with ~55B active per token. You get near-dense quality at a fraction of the inference cost.
- Context window: Up to 1 million tokens. Useful for large codebases, lengthy documents, or extended multi-turn agent sessions.
- Tool use & reasoning: NVIDIA trained the model specifically for function calling and multi-step reasoning — the two capabilities AI agents rely on most.
Because OpenClaw is tool-use-first (skills, MCP servers, browser, file system), pairing it with a model built for the same workload is a natural fit.
Option 1 — Managed Hosting on OpenClaw Launch
If you use OpenClaw Launch (the managed service), you pick your model from the dashboard — no config files, no restarts.
- Log in at openclawlaunch.com/dashboard.
- Open your instance and go to the Model settings tab.
- Select OpenRouter as the provider and search for “Nemotron” in the model list.
- Pick the Nemotron 3 Ultra entry and save — the change hot-applies instantly.
You can optionally bring your own OpenRouter API key (BYOK) under Settings → API Keys to use your own quota and billing. Without a key, the platform routes requests through its shared key.
Option 2 — Self-Hosting with OpenRouter
On a self-hosted OpenClaw instance, set the model under your agent's primary model config. OpenClaw uses the id format openrouter/nvidia/<model-slug>.
Important: NVIDIA's exact slug on OpenRouter can change between releases. Before copying the snippet below, search “Nemotron” on openrouter.ai/models and copy the current slug from the model's detail page. The snippet below uses a representative slug as a placeholder.
# openclaw.json (agent primary model)
{
"agents": {
"defaults": {
"model": {
"primary": "openrouter/nvidia/llama-3_1-nemotron-ultra-253b-v1"
}
}
},
"models": {
"providers": {
"openrouter": {
"apiKey": "sk-or-..."
}
}
}
}Replace llama-3_1-nemotron-ultra-253b-v1 with the slug you copied from OpenRouter, and fill in your API key. After saving, OpenClaw hot-applies model changes without a full restart.
Option 3 — Running Locally via Ollama
Nemotron 3 Ultra is also available through Ollama, which lets you run it entirely on your own hardware (a multi-GPU machine or a well-resourced workstation).
# Pull the model
ollama pull nemotron-3-ultra
# Verify it loaded
ollama listThen point OpenClaw at your local Ollama server. In openclaw.json:
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/nemotron-3-ultra"
}
}
},
"models": {
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434"
}
}
}
}The Ollama provider in OpenClaw talks directly to your local server — no API key required, and requests never leave your machine. See the OpenClaw Ollama guide for a full Ollama setup walkthrough.
Cost and Free Tier Notes
Nemotron 3 Ultra is available on OpenRouter, which includes a free tier for many models. Whether Nemotron 3 Ultra sits on the free tier at any given time depends on OpenRouter's current offering — check the model's page on openrouter.ai for live pricing.
- Free-tier route: On OpenRouter's free tier, rate limits are tighter. Fine for exploration; not recommended for production agents with high message volume.
- Paid route: Bring your own OpenRouter key (BYOK) for higher limits and predictable billing. MoE inference is priced on active parameters, so Nemotron 3 Ultra costs considerably less per token than a dense 550B model would.
- Ollama / self-hosted: No per-token cost once you have the hardware. Running ~55B active MoE still requires meaningful VRAM — expect multi-GPU setups for full-speed inference.
Which Model Should I Pick?
Nemotron 3 Ultra is not the right tool for every job. A quick-pick guide:
- Complex multi-step agents, heavy tool use, large codebases: Nemotron 3 Ultra — its 1M-token context and function-calling focus shine here.
- Fast back-and-forth chat, low-latency replies: A smaller model (Qwen, Gemma, Mistral) will feel snappier. MoE routing adds some overhead.
- Image or audio tasks: Nemotron 3 Ultra is text-only. Use a multimodal model for those inputs.
- Budget-constrained, low-volume: Try OpenRouter's free tier first to benchmark quality before committing to paid.
- Air-gapped / private deployment: The Ollama path keeps everything local.
You can compare model options on the Models page or switch models any time from your dashboard without redeploying.
Frequently Asked Questions
Is Nemotron 3 Ultra free to use?
It is available on OpenRouter, which offers a free tier for many models. Check OpenRouter's current model listing to confirm whether Nemotron 3 Ultra is on the free tier at the time you read this — free-tier availability changes. Running it locally via Ollama has no per-request cost once you have the hardware.
Can I run Nemotron 3 Ultra with OpenClaw?
Yes. OpenClaw supports any OpenRouter model using the openrouter/nvidia/<slug> id format, and it also supports Ollama for local inference. Both paths are covered above. On OpenClaw Launch (managed), you select the model directly from the dashboard — no config editing required.
Does Nemotron 3 Ultra work well for AI agents?
Yes — NVIDIA designed it specifically for agentic use cases. The 1M-token context window lets the agent retain long histories without truncation, and the model's function-calling capability maps directly to OpenClaw's skill and MCP tool system. Users running multi-step workflows (research, coding, task automation) tend to see the biggest benefit.
Prefer not to manage model configs yourself? OpenClaw Launch lets you pick Nemotron 3 Ultra (or any OpenRouter model) from a dropdown in your dashboard — no YAML, no restarts, no server management. Deploy in 30 seconds and switch models any time.