Setup Guide
OpenClaw + LM Studio: Run Local AI Models
LM Studio is a desktop app that lets you download, manage, and run AI models locally through a friendly GUI. Connect it to OpenClaw for free local inference — no API costs, complete privacy, and full offline capability.
What Is LM Studio?
LM Studio is a desktop application for macOS, Windows, and Linux that makes running local AI models as easy as browsing an app store. You can search for models from Hugging Face, download them with one click, and run them entirely on your own hardware — no internet connection required once downloaded.
LM Studio exposes a local server with an OpenAI-compatible API, which means any tool that supports the OpenAI API format — including OpenClaw — can connect to it directly. You get the benefits of a polished GUI for model management plus a standard API for programmatic access.
Why Use LM Studio with OpenClaw?
LM Studio is the easiest way to get local AI inference running without touching the command line. Paired with OpenClaw, you get a fully private, cost-free AI agent:
- No API costs — Local models run on your hardware. Zero per-token billing, no monthly API subscriptions.
- Complete privacy — Your prompts and conversations never leave your machine. Ideal for sensitive workflows.
- GUI-based model management — Browse, download, and switch models from a desktop app. No command-line knowledge needed.
- OpenAI-compatible API — Works out of the box with OpenClaw's model provider configuration.
- All OpenClaw features — Telegram and Discord bots, 3,200+ ClawHub skills, session management, and the web gateway — all with your local model powering it.
LM Studio vs. Ollama vs. OpenRouter
There are several ways to connect AI models to OpenClaw. Here's how LM Studio compares to the most popular alternatives:
| LM Studio | Ollama | OpenRouter | |
|---|---|---|---|
| Cost | Free | Free | Pay-per-token (cloud) |
| Ease of use | Easiest — GUI app | Easy — CLI commands | Easiest — just an API key |
| Model selection | Thousands via Hugging Face | Hundreds via Ollama registry | 100+ top cloud models |
| GPU required | Recommended (CPU fallback) | Recommended (CPU fallback) | No — cloud inference |
| Privacy | Full — stays on device | Full — stays on device | Routed via cloud API |
| Offline support | Yes | Yes | No |
| Model quality | Good (open-source models) | Good (open-source models) | Best (GPT, Claude, Gemini) |
| Setup | Install app, click download | Install CLI, run commands | Get API key, paste in config |
LM Studio is the best choice if you want local inference with the simplest setup experience — especially if you're not comfortable with the command line. If you prefer a CLI workflow, Ollama is a great alternative. For the highest model quality without local hardware, OpenRouter gives you access to Claude, GPT, and Gemini.
How to Set Up LM Studio
Step 1: Download and Install LM Studio
Visit lmstudio.ai and download the installer for your operating system. LM Studio supports:
- macOS — Apple Silicon (M1/M2/M3/M4) and Intel
- Windows — NVIDIA and AMD GPUs supported
- Linux — AppImage or .deb package
Install and launch LM Studio. On first launch it will prompt you to complete a quick setup wizard.
Step 2: Download a Model
Click the Search tab (magnifying glass icon) in the left sidebar. Browse or search for a model. Popular models that work well with OpenClaw:
| Model | Size | VRAM | Best For |
|---|---|---|---|
| Llama 3.2 3B | ~2 GB | 3 GB | Low-spec hardware, quick replies |
| Llama 3.2 8B | ~5 GB | 6 GB | Everyday tasks, good balance |
| Mistral Small 3.1 24B | ~14 GB | 16 GB | Strong reasoning, coding |
| Qwen 3 32B | ~20 GB | 22 GB | Multilingual, complex tasks |
| DeepSeek R1 14B | ~9 GB | 10 GB | Coding, math, analysis |
| Gemma 3 12B | ~8 GB | 9 GB | Google's open model, versatile |
Click the model name, select a quantization variant (see Performance Tips below for guidance), and click Download. LM Studio will download it from Hugging Face.
Step 3: Load the Model
Go to the My Models tab (or the Chat tab) and select your downloaded model from the dropdown. LM Studio will load it into memory. You can verify it's working by sending a test message in the chat interface.
Step 4: Enable the Local Server
Click the Local Server tab (the plug/server icon in the sidebar). Click Start Server. By default the server starts on port 1234 and exposes an OpenAI-compatible API at:
http://localhost:1234/v1You can verify the server is running by visiting http://localhost:1234/v1/models in your browser — it should return a JSON list of loaded models.
How to Connect LM Studio to OpenClaw
LM Studio's local server is OpenAI-compatible, so OpenClaw can connect to it using the same configuration pattern as any OpenAI-compatible provider.
1. Add LM Studio as a model provider
In your openclaw.json config file, add LM Studio under models.providers:
{
"models": {
"providers": {
"lmstudio": {
"apiBase": "http://localhost:1234/v1",
"apiKey": "lm-studio"
}
}
}
}The apiKey value can be any non-empty string — LM Studio's local server does not validate API keys. Use "lm-studio" as a placeholder.
2. Set LM Studio as the default model
Point your agent's primary model to your loaded LM Studio model. Use the exact model ID shown in LM Studio's server tab (it matches the Hugging Face repo path):
{
"agents": {
"defaults": {
"model": {
"primary": "lmstudio/meta-llama/Llama-3.2-8B-Instruct"
}
}
}
}Replace meta-llama/Llama-3.2-8B-Instruct with the model ID displayed in LM Studio's server tab under Model Identifier.
3. Full example config
Here is a complete minimal openclaw.json showing both the provider and model settings together:
{
"models": {
"providers": {
"lmstudio": {
"apiBase": "http://localhost:1234/v1",
"apiKey": "lm-studio"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "lmstudio/meta-llama/Llama-3.2-8B-Instruct"
}
}
}
}4. Restart OpenClaw
After saving your config, restart your OpenClaw instance. The agent will now route all inference through your local LM Studio server.
localhost with host.docker.internal: http://host.docker.internal:1234/v1Performance Tips
Choose the Right Quantization
When downloading a model in LM Studio, you'll see multiple quantization variants (e.g., Q4_K_M, Q5_K_M, Q8_0, F16). Quantization reduces the model's memory footprint at the cost of some accuracy:
- Q4_K_M — Recommended for most users. Best balance of size, speed, and quality. Runs on 6–16 GB VRAM depending on model size.
- Q5_K_M — Slightly better quality than Q4_K_M, ~15% larger. Good if you have headroom.
- Q8_0 — Near full quality, roughly 2× the size of Q4_K_M. Use if you have ample VRAM.
- F16 — Full precision, highest quality, largest size. Requires very large VRAM (not practical for most).
As a starting point, download the Q4_K_M variant of your chosen model. You can always try a higher quantization later if your hardware allows it.
Enable GPU Acceleration
GPU acceleration makes a massive difference in response speed. LM Studio auto-detects your GPU, but you can check and configure it in Settings → Hardware:
- NVIDIA (CUDA) — Supported out of the box on Windows and Linux. Make sure you have up-to-date NVIDIA drivers installed.
- Apple Silicon (Metal) — Fully supported. LM Studio uses Apple's Metal backend for M1/M2/M3/M4 chips, which share CPU/GPU memory.
- AMD (ROCm) — Supported on Linux. Windows support is limited.
In the Local Server tab, check that the GPU Offload slider is set to a high value (or Max). This controls how many model layers are offloaded to the GPU — more layers on GPU means faster inference.
VRAM Requirements
A rough guide for the minimum VRAM needed for common model sizes using Q4_K_M quantization:
- 3B models — ~3 GB VRAM (runs on integrated graphics or older GPUs)
- 7–8B models — ~5–6 GB VRAM (GTX 1080, RTX 3060, M1 16 GB)
- 13–14B models — ~9–10 GB VRAM (RTX 3080, M2 Pro 32 GB)
- 24–32B models — ~16–22 GB VRAM (RTX 4090, M2 Max/Ultra)
- 70B models — ~40+ GB VRAM (requires professional GPU or Mac Studio Ultra)
If your model doesn't fully fit in VRAM, LM Studio will offload the remaining layers to RAM, which is slower but still functional.
Context Length
In LM Studio's Server Settings, you can configure the context length (how many tokens the model can process in one conversation). A longer context uses more VRAM. Start with 4096 and increase if needed:
- 2048 — Minimal VRAM use, good for short Q&A
- 4096 — Recommended default for OpenClaw conversations
- 8192–16384 — Needed for long documents or complex multi-turn chats
Troubleshooting
Connection Refused
If OpenClaw reports a connection error when trying to reach LM Studio:
- Make sure the local server is running — go to LM Studio's Local Server tab and confirm the status shows Running.
- Check that the port in your OpenClaw config matches the port shown in LM Studio (default is
1234). - If OpenClaw runs in Docker, use
host.docker.internal:1234instead oflocalhost:1234. - Test the endpoint directly:
curl http://localhost:1234/v1/models - Check your firewall — port 1234 may be blocked. Try temporarily disabling your firewall to confirm.
Slow Responses
If inference is slower than expected:
- Verify GPU acceleration is active — check the GPU Offload setting in LM Studio's server tab. If it reads 0, your model is running on CPU only.
- Switch to a smaller or more aggressively quantized model (e.g., from Q5_K_M to Q4_K_M, or from an 8B model to a 3B model).
- Reduce the context length in server settings — shorter context means less VRAM usage and faster processing.
- Close other applications that are using the GPU (games, video editors, other AI tools).
Out of Memory / Model Fails to Load
If LM Studio fails to load the model or crashes:
- The model may be too large for your available VRAM + RAM. Try a smaller quantization variant (e.g., Q3_K_M instead of Q4_K_M) or a smaller model entirely.
- Reduce the GPU offload layers slider — this shifts some layers to CPU RAM, which is slower but allows larger models to run.
- Check LM Studio's Hardware panel to see how much VRAM is currently free before loading.
- On Windows, enable virtual VRAM in NVIDIA settings to let the GPU use system RAM as overflow (slower, but lets you run larger models).
Wrong Model ID
If OpenClaw connects to LM Studio but reports model errors:
- Open LM Studio's Local Server tab and look at the Model Identifier field — copy it exactly as shown.
- The model must be loaded in LM Studio (green indicator) before OpenClaw can use it. LM Studio only serves the currently loaded model.
- Visit
http://localhost:1234/v1/modelsto see the exact model ID that LM Studio is currently serving, then update your OpenClaw config to match.
Use OpenClaw Launch Instead
Don't have a capable GPU, or prefer not to manage local model setup? OpenClaw Launch runs your AI agent in the cloud with top-tier models pre-configured — Claude, GPT, and Gemini, ready to go. No hardware investment, no model downloads, no local server to maintain.
OpenClaw Launch deploys your agent in about 10 seconds with a visual configurator. You get all OpenClaw features — Telegram, Discord, ClawHub skills, web UI — powered by the best available cloud models.
Frequently Asked Questions
Is LM Studio free?
Yes. LM Studio is free for personal use. The models available through it (Llama, Mistral, Qwen, DeepSeek, Gemma, and thousands more) are also free to download and run. Your only costs are electricity and the hardware. There is a paid LM Studio Pro plan for commercial use, but the local server feature is available on the free tier.
Do I need a powerful GPU to use LM Studio?
You don't strictly need a GPU — LM Studio can run models on CPU only — but without GPU acceleration, responses will be very slow (multiple seconds per token for larger models). For a usable experience with OpenClaw, a GPU with at least 6 GB VRAM (or an Apple Silicon Mac with 16 GB+ unified memory) is recommended. For GPU-free setups, consider using OpenClaw Launch with cloud models instead.
How is LM Studio different from Ollama?
Both LM Studio and Ollama run local models and expose an OpenAI-compatible API. The key difference is the interface: LM Studio is a graphical desktop application, making it easier to browse and manage models visually. Ollama is command-line based, which makes it better suited for servers and automated workflows. Both work equally well with OpenClaw.
Can I switch models without restarting OpenClaw?
You can load a different model in LM Studio at any time, but you'll need to update the model ID in your OpenClaw config to match the newly loaded model and restart OpenClaw for the change to take effect. LM Studio only serves one model at a time from its local server.
What's Next?
- OpenClaw + Ollama — CLI-based local model runner, great for servers and scripts
- OpenClaw + OpenRouter — Access Claude, GPT, and Gemini with one API key
- OpenClaw + LiteLLM — Proxy 100+ providers through one unified API
- OpenClaw + DeepSeek — Use DeepSeek's coding and reasoning models
- Install OpenClaw — Self-host OpenClaw on your own server