← Home

Setup Guide

OpenClaw + LM Studio: Run Local AI Models

LM Studio is a desktop app that lets you download, manage, and run AI models locally through a friendly GUI. Connect it to OpenClaw for free local inference — no API costs, complete privacy, and full offline capability.

What Is LM Studio?

LM Studio is a desktop application for macOS, Windows, and Linux that makes running local AI models as easy as browsing an app store. You can search for models from Hugging Face, download them with one click, and run them entirely on your own hardware — no internet connection required once downloaded.

LM Studio exposes a local server with an OpenAI-compatible API, which means any tool that supports the OpenAI API format — including OpenClaw — can connect to it directly. You get the benefits of a polished GUI for model management plus a standard API for programmatic access.

Why Use LM Studio with OpenClaw?

LM Studio is the easiest way to get local AI inference running without touching the command line. Paired with OpenClaw, you get a fully private, cost-free AI agent:

  • No API costs — Local models run on your hardware. Zero per-token billing, no monthly API subscriptions.
  • Complete privacy — Your prompts and conversations never leave your machine. Ideal for sensitive workflows.
  • GUI-based model management — Browse, download, and switch models from a desktop app. No command-line knowledge needed.
  • OpenAI-compatible API — Works out of the box with OpenClaw's model provider configuration.
  • All OpenClaw features — Telegram and Discord bots, 3,200+ ClawHub skills, session management, and the web gateway — all with your local model powering it.

LM Studio vs. Ollama vs. OpenRouter

There are several ways to connect AI models to OpenClaw. Here's how LM Studio compares to the most popular alternatives:

LM StudioOllamaOpenRouter
CostFreeFreePay-per-token (cloud)
Ease of useEasiest — GUI appEasy — CLI commandsEasiest — just an API key
Model selectionThousands via Hugging FaceHundreds via Ollama registry100+ top cloud models
GPU requiredRecommended (CPU fallback)Recommended (CPU fallback)No — cloud inference
PrivacyFull — stays on deviceFull — stays on deviceRouted via cloud API
Offline supportYesYesNo
Model qualityGood (open-source models)Good (open-source models)Best (GPT, Claude, Gemini)
SetupInstall app, click downloadInstall CLI, run commandsGet API key, paste in config

LM Studio is the best choice if you want local inference with the simplest setup experience — especially if you're not comfortable with the command line. If you prefer a CLI workflow, Ollama is a great alternative. For the highest model quality without local hardware, OpenRouter gives you access to Claude, GPT, and Gemini.

How to Set Up LM Studio

Step 1: Download and Install LM Studio

Visit lmstudio.ai and download the installer for your operating system. LM Studio supports:

  • macOS — Apple Silicon (M1/M2/M3/M4) and Intel
  • Windows — NVIDIA and AMD GPUs supported
  • Linux — AppImage or .deb package

Install and launch LM Studio. On first launch it will prompt you to complete a quick setup wizard.

Step 2: Download a Model

Click the Search tab (magnifying glass icon) in the left sidebar. Browse or search for a model. Popular models that work well with OpenClaw:

ModelSizeVRAMBest For
Llama 3.2 3B~2 GB3 GBLow-spec hardware, quick replies
Llama 3.2 8B~5 GB6 GBEveryday tasks, good balance
Mistral Small 3.1 24B~14 GB16 GBStrong reasoning, coding
Qwen 3 32B~20 GB22 GBMultilingual, complex tasks
DeepSeek R1 14B~9 GB10 GBCoding, math, analysis
Gemma 3 12B~8 GB9 GBGoogle's open model, versatile

Click the model name, select a quantization variant (see Performance Tips below for guidance), and click Download. LM Studio will download it from Hugging Face.

Step 3: Load the Model

Go to the My Models tab (or the Chat tab) and select your downloaded model from the dropdown. LM Studio will load it into memory. You can verify it's working by sending a test message in the chat interface.

Step 4: Enable the Local Server

Click the Local Server tab (the plug/server icon in the sidebar). Click Start Server. By default the server starts on port 1234 and exposes an OpenAI-compatible API at:

http://localhost:1234/v1

You can verify the server is running by visiting http://localhost:1234/v1/models in your browser — it should return a JSON list of loaded models.

How to Connect LM Studio to OpenClaw

LM Studio's local server is OpenAI-compatible, so OpenClaw can connect to it using the same configuration pattern as any OpenAI-compatible provider.

1. Add LM Studio as a model provider

In your openclaw.json config file, add LM Studio under models.providers:

{
  "models": {
    "providers": {
      "lmstudio": {
        "apiBase": "http://localhost:1234/v1",
        "apiKey": "lm-studio"
      }
    }
  }
}

The apiKey value can be any non-empty string — LM Studio's local server does not validate API keys. Use "lm-studio" as a placeholder.

2. Set LM Studio as the default model

Point your agent's primary model to your loaded LM Studio model. Use the exact model ID shown in LM Studio's server tab (it matches the Hugging Face repo path):

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "lmstudio/meta-llama/Llama-3.2-8B-Instruct"
      }
    }
  }
}

Replace meta-llama/Llama-3.2-8B-Instruct with the model ID displayed in LM Studio's server tab under Model Identifier.

3. Full example config

Here is a complete minimal openclaw.json showing both the provider and model settings together:

{
  "models": {
    "providers": {
      "lmstudio": {
        "apiBase": "http://localhost:1234/v1",
        "apiKey": "lm-studio"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "lmstudio/meta-llama/Llama-3.2-8B-Instruct"
      }
    }
  }
}

4. Restart OpenClaw

After saving your config, restart your OpenClaw instance. The agent will now route all inference through your local LM Studio server.

Docker note: If OpenClaw runs inside Docker but LM Studio runs on your host machine, replace localhost with host.docker.internal: http://host.docker.internal:1234/v1

Performance Tips

Choose the Right Quantization

When downloading a model in LM Studio, you'll see multiple quantization variants (e.g., Q4_K_M, Q5_K_M, Q8_0, F16). Quantization reduces the model's memory footprint at the cost of some accuracy:

  • Q4_K_M — Recommended for most users. Best balance of size, speed, and quality. Runs on 6–16 GB VRAM depending on model size.
  • Q5_K_M — Slightly better quality than Q4_K_M, ~15% larger. Good if you have headroom.
  • Q8_0 — Near full quality, roughly 2× the size of Q4_K_M. Use if you have ample VRAM.
  • F16 — Full precision, highest quality, largest size. Requires very large VRAM (not practical for most).

As a starting point, download the Q4_K_M variant of your chosen model. You can always try a higher quantization later if your hardware allows it.

Enable GPU Acceleration

GPU acceleration makes a massive difference in response speed. LM Studio auto-detects your GPU, but you can check and configure it in Settings → Hardware:

  • NVIDIA (CUDA) — Supported out of the box on Windows and Linux. Make sure you have up-to-date NVIDIA drivers installed.
  • Apple Silicon (Metal) — Fully supported. LM Studio uses Apple's Metal backend for M1/M2/M3/M4 chips, which share CPU/GPU memory.
  • AMD (ROCm) — Supported on Linux. Windows support is limited.

In the Local Server tab, check that the GPU Offload slider is set to a high value (or Max). This controls how many model layers are offloaded to the GPU — more layers on GPU means faster inference.

VRAM Requirements

A rough guide for the minimum VRAM needed for common model sizes using Q4_K_M quantization:

  • 3B models — ~3 GB VRAM (runs on integrated graphics or older GPUs)
  • 7–8B models — ~5–6 GB VRAM (GTX 1080, RTX 3060, M1 16 GB)
  • 13–14B models — ~9–10 GB VRAM (RTX 3080, M2 Pro 32 GB)
  • 24–32B models — ~16–22 GB VRAM (RTX 4090, M2 Max/Ultra)
  • 70B models — ~40+ GB VRAM (requires professional GPU or Mac Studio Ultra)

If your model doesn't fully fit in VRAM, LM Studio will offload the remaining layers to RAM, which is slower but still functional.

Context Length

In LM Studio's Server Settings, you can configure the context length (how many tokens the model can process in one conversation). A longer context uses more VRAM. Start with 4096 and increase if needed:

  • 2048 — Minimal VRAM use, good for short Q&A
  • 4096 — Recommended default for OpenClaw conversations
  • 8192–16384 — Needed for long documents or complex multi-turn chats

Troubleshooting

Connection Refused

If OpenClaw reports a connection error when trying to reach LM Studio:

  • Make sure the local server is running — go to LM Studio's Local Server tab and confirm the status shows Running.
  • Check that the port in your OpenClaw config matches the port shown in LM Studio (default is 1234).
  • If OpenClaw runs in Docker, use host.docker.internal:1234 instead of localhost:1234.
  • Test the endpoint directly: curl http://localhost:1234/v1/models
  • Check your firewall — port 1234 may be blocked. Try temporarily disabling your firewall to confirm.

Slow Responses

If inference is slower than expected:

  • Verify GPU acceleration is active — check the GPU Offload setting in LM Studio's server tab. If it reads 0, your model is running on CPU only.
  • Switch to a smaller or more aggressively quantized model (e.g., from Q5_K_M to Q4_K_M, or from an 8B model to a 3B model).
  • Reduce the context length in server settings — shorter context means less VRAM usage and faster processing.
  • Close other applications that are using the GPU (games, video editors, other AI tools).

Out of Memory / Model Fails to Load

If LM Studio fails to load the model or crashes:

  • The model may be too large for your available VRAM + RAM. Try a smaller quantization variant (e.g., Q3_K_M instead of Q4_K_M) or a smaller model entirely.
  • Reduce the GPU offload layers slider — this shifts some layers to CPU RAM, which is slower but allows larger models to run.
  • Check LM Studio's Hardware panel to see how much VRAM is currently free before loading.
  • On Windows, enable virtual VRAM in NVIDIA settings to let the GPU use system RAM as overflow (slower, but lets you run larger models).

Wrong Model ID

If OpenClaw connects to LM Studio but reports model errors:

  • Open LM Studio's Local Server tab and look at the Model Identifier field — copy it exactly as shown.
  • The model must be loaded in LM Studio (green indicator) before OpenClaw can use it. LM Studio only serves the currently loaded model.
  • Visit http://localhost:1234/v1/models to see the exact model ID that LM Studio is currently serving, then update your OpenClaw config to match.

Use OpenClaw Launch Instead

Don't have a capable GPU, or prefer not to manage local model setup? OpenClaw Launch runs your AI agent in the cloud with top-tier models pre-configured — Claude, GPT, and Gemini, ready to go. No hardware investment, no model downloads, no local server to maintain.

OpenClaw Launch deploys your agent in about 10 seconds with a visual configurator. You get all OpenClaw features — Telegram, Discord, ClawHub skills, web UI — powered by the best available cloud models.

Frequently Asked Questions

Is LM Studio free?

Yes. LM Studio is free for personal use. The models available through it (Llama, Mistral, Qwen, DeepSeek, Gemma, and thousands more) are also free to download and run. Your only costs are electricity and the hardware. There is a paid LM Studio Pro plan for commercial use, but the local server feature is available on the free tier.

Do I need a powerful GPU to use LM Studio?

You don't strictly need a GPU — LM Studio can run models on CPU only — but without GPU acceleration, responses will be very slow (multiple seconds per token for larger models). For a usable experience with OpenClaw, a GPU with at least 6 GB VRAM (or an Apple Silicon Mac with 16 GB+ unified memory) is recommended. For GPU-free setups, consider using OpenClaw Launch with cloud models instead.

How is LM Studio different from Ollama?

Both LM Studio and Ollama run local models and expose an OpenAI-compatible API. The key difference is the interface: LM Studio is a graphical desktop application, making it easier to browse and manage models visually. Ollama is command-line based, which makes it better suited for servers and automated workflows. Both work equally well with OpenClaw.

Can I switch models without restarting OpenClaw?

You can load a different model in LM Studio at any time, but you'll need to update the model ID in your OpenClaw config to match the newly loaded model and restart OpenClaw for the change to take effect. LM Studio only serves one model at a time from its local server.

What's Next?

No GPU? Deploy in the Cloud Instead

Skip the local setup. Deploy your AI agent with top-tier cloud models in 10 seconds — no hardware, no downloads, no configuration.

Deploy with OpenClaw Launch