Guide
MiniMax M3 + Hermes Agent: 1M-Token Context Setup Guide
MiniMax M3 launched June 1, 2026 and immediately trended worldwide — a sparse Mixture-of-Experts model with a 1-million-token context window, native multimodal inputs, and company-reported benchmark scores that place it among the top frontier models. This guide shows you how to run MiniMax M3 inside a Hermes Agent via OpenRouter in under five minutes.
What Is MiniMax M3?
MiniMax M3 is a frontier language model released by MiniMax on June 1, 2026, with API access live from May 31. It is the generational successor to the M2.x line (M2.5 and M2.7 were point releases) — not an incremental update but a full architectural rebuild.
The headline changes over M2:
- Sparse Mixture-of-Experts with MiniMax Sparse Attention (MSA) — a proprietary attention mechanism designed to handle very long contexts without the quadratic cost of dense attention. MiniMax reports approximately 9× faster prefill at 1M tokens and approximately 15× faster decoding versus the prior generation.
- 1 million token context window — 512K tokens are guaranteed at standard pricing; inputs beyond 512K are supported at higher cost.
- Native multimodal input — text, image, and video inputs are supported natively. Output is text only.
- Open-weight commitment — MiniMax announced that model weights will be published on Hugging Face and GitHub. As of launch, the weights are rolling out (not yet fully available for all users to download).
Parameter count has not been publicly disclosed.
Why MiniMax M3 Is Trending
Search interest in “minimax m3” spiked over 3,700% in Google Trends in the week of its launch. A few factors explain the surge:
- A 1M-token context window is rare among hosted API models. Most competitors top out at 128K–200K tokens at standard pricing.
- The benchmark scores (discussed below) were eye-catching relative to the price point — especially the coding and agentic benchmarks.
- The open-weight announcement puts M3 in a small class of frontier-scale models that will eventually be self-hostable, which drew attention from the open-source AI community.
- The OpenRouter listing made it immediately accessible to any developer with an existing OpenRouter key, lowering the barrier to experiment.
MiniMax M3 Benchmarks (Company-Reported)
The following benchmark figures are published by MiniMax. They have not been independently verified by third parties at the time of writing (June 2026).
| Benchmark | MiniMax M3 | Context |
|---|---|---|
| SWE-Bench Pro | 59.0% | Above GPT-5.5 and Gemini 3.1 Pro per MiniMax; approaching Claude Opus 4.7 |
| BrowseComp | 83.5 | Above Claude Opus 4.7's reported 79.3 per MiniMax |
| Terminal-Bench 2.1 | 66.0% | Strong agentic CLI task performance |
SWE-Bench and BrowseComp are agentic benchmarks — they measure the model's ability to resolve real software issues and complete web research tasks autonomously. These are directly relevant to agent use cases, which is why M3 attracted immediate attention from teams running Hermes, OpenClaw, and other agent frameworks.
Why 1M Context Matters for Agents
Most agent workloads are context-hungry. A Hermes Agent handling a long-running research task, a large codebase, or weeks of conversation history quickly exhausts a 128K context window. With 1M tokens, M3 can hold:
- Several full novels worth of text in a single context
- An entire medium-sized codebase in one prompt
- Months of dense conversation history without summarization or pruning
- Multiple long documents for cross-document reasoning without chunking
For Hermes Agent specifically, this means fewer context-management workarounds. Hermes already handles long sessions well, but pairing it with M3 removes the context ceiling for most real-world workloads.
MiniMax M3 Pricing via OpenRouter
| Token type | Promotional price | Standard price |
|---|---|---|
| Input (per million tokens) | $0.30 | $0.60 |
| Output (per million tokens) | $1.20 | $2.40 |
Pricing shown is for the OpenRouter-hosted API as of June 2026. MiniMax also offers a direct API and subscription plans at minimaxi.com. Promotional rates may change — verify current pricing at openrouter.ai/minimax/minimax-m3.
For OpenClaw Launch pricing, see /pricing.
How to Use MiniMax M3 with Hermes Agent via OpenRouter
Option 1: OpenClaw Launch (Easiest)
OpenClaw Launch hosts managed Hermes Agent instances and routes inference through OpenRouter automatically. No API keys to wire up manually.
- Go to openclawlaunch.com/#configurator and start a Hermes deploy.
- In the model selector, choose MiniMax M3 from the OpenRouter model list (ID:
minimax/minimax-m3). - Connect your channel (Telegram, Discord, WhatsApp, etc.) and click Deploy. Your agent is running M3 with 1M-token context in roughly 30 seconds.
To bring your own OpenRouter key for BYOK billing, see the BYOK guide — paste your OPENROUTER_API_KEY in the configurator and all inference routes through your account.
/model minimax/minimax-m3 — no redeploy needed.Option 2: Self-Hosted Hermes Agent
Set your OpenRouter key and point Hermes at MiniMax M3. Since OpenRouter is Hermes's default fallback aggregator, this is a two-command setup:
export OPENROUTER_API_KEY=sk-or-...
# Set provider and model
hermes inference set openrouter
hermes model set minimax/minimax-m3Or edit your Hermes config file directly:
# /opt/data/config.yaml
inference:
provider: openrouter
model:
default: minimax/minimax-m3Get your OpenRouter key at openrouter.ai/keys. No minimum spend — you pay per token used. Full OpenRouter setup is covered in the Hermes + OpenRouter guide.
For full self-hosted Hermes setup from scratch, see the Hermes deploy guide.
MiniMax M3 vs Other Agent Models
| Model | Context | Input price (per M) | Multimodal input | Open weight |
|---|---|---|---|---|
| MiniMax M3 | 1M tokens | $0.30 (promo) | Text, image, video | Announced / rolling out |
anthropic/claude-sonnet-4.6 | 200K tokens | $3.00 | Text, image | No |
openai/gpt-5.5 | 128K tokens | $2.50 | Text, image | No |
google/gemini-3.1-pro-preview | 1M tokens | $1.25 | Text, image, video, audio | No |
deepseek/deepseek-v4-pro | 128K tokens | $0.14 | Text | Yes |
M3 sits in an unusual position: 1M context and multimodal inputs at a price point much lower than Claude or GPT-5.5 via OpenRouter. The main tradeoff is that benchmark claims are company-reported and not yet independently reproduced. For agentic workloads where context length is the bottleneck, M3 is worth testing.
What's Next?
- What is Hermes Agent? — Overview of the Hermes Agent framework
- Hermes Agent + OpenRouter — Full OpenRouter setup guide for Hermes
- Hermes Agent BYOK — Bring your own API key to OpenClaw Launch
- Deploy Hermes Agent — Full self-hosted deploy guide
- Hermes Hosting — Managed Hermes on OpenClaw Launch
- Compare Models — Side-by-side model comparison for agent use cases
Frequently Asked Questions
What is MiniMax M3?
MiniMax M3 is a frontier language model released June 1, 2026 by MiniMax. It uses a sparse Mixture-of-Experts architecture with MiniMax Sparse Attention (MSA), supports a 1-million-token context window, and accepts text, image, and video as inputs. It is the full architectural successor to the M2.x series.
Is MiniMax M3 open source?
MiniMax announced that M3 weights will be published on Hugging Face and GitHub. As of the June 1, 2026 launch, the weights are rolling out and not yet fully available for all users to download. The model is available immediately via the MiniMax API and via OpenRouter.
How much does MiniMax M3 cost via OpenRouter?
At the promotional rate on OpenRouter: $0.30 per million input tokens and $1.20 per million output tokens. Standard rates are $0.60 input / $2.40 output. Verify current pricing at openrouter.ai/minimax/minimax-m3 before budgeting.
Can Hermes Agent use MiniMax M3?
Yes. Set your OpenRouter inference provider and model to minimax/minimax-m3. On OpenClaw Launch, select M3 from the model dropdown in the configurator — no manual key setup required unless you want BYOK billing.
How does MiniMax M3 compare to Claude Opus 4.7?
According to MiniMax's own benchmark reports (not independently verified), M3 scores 59.0% on SWE-Bench Pro (approaching Claude Opus 4.7) and 83.5 on BrowseComp (above Claude Opus 4.7's reported 79.3). M3 also offers a lower input price via OpenRouter and a larger 1M-token context window. Claude Opus 4.7 has a longer track record of independent third-party evaluation and is generally regarded as more reliable for production workloads where benchmark reproducibility matters.
What is the MiniMax M3 OpenRouter model ID?
The OpenRouter model ID for MiniMax M3 is minimax/minimax-m3. Use this in the Hermes config or pass it to the /model slash command at runtime.