March 20, 2026tutorial6 min read

How to Use OpenClaw Voice Mode — Speech Setup Guide

By Zack

What if you could talk to your AI assistant instead of typing? With OpenClaw's voice mode, you can. Send voice messages, get spoken replies, and have hands-free AI conversations on Telegram, Discord, and web chat.

This guide walks you through setting up speech-to-text (STT) and text-to-speech (TTS) on OpenClaw, choosing the right providers, and getting the best voice experience.

What Is OpenClaw Voice Mode?

OpenClaw voice mode adds two capabilities to your AI agent:

Speech-to-Text (STT): Converts your voice messages into text so the AI can process them
Text-to-Speech (TTS): Converts the AI's text responses into spoken audio so you can listen instead of reading

Voice mode works on Telegram (voice messages), Discord (voice channels), and the web chat widget (browser microphone). You can use STT alone, TTS alone, or both together for a fully hands-free experience.

How Voice Mode Works

The voice pipeline is straightforward:

You speak — send a voice message on Telegram, join a Discord voice channel, or click the mic button in web chat
STT converts your speech to text — your chosen STT provider transcribes the audio
The AI processes the text — your AI agent generates a response, just like a normal text message
TTS converts the response to audio — your chosen TTS provider generates spoken audio
You hear the reply — the audio is sent back through the same channel

The entire round-trip typically takes 3-8 seconds, depending on your STT/TTS providers and the length of the response.

Setting Up Voice on Telegram

Telegram has built-in voice message support, making it the easiest platform for OpenClaw voice mode.

Speech-to-Text (Receiving Voice Messages)

Telegram automatically sends voice messages as audio files to your bot. OpenClaw can transcribe these using your configured STT provider.

In your OpenClaw config, enable the STT plugin
Configure your STT provider (see Choosing an STT Provider below)
Send a voice message to your bot — it will be automatically transcribed and processed

Text-to-Speech (Sending Voice Replies)

To have your bot reply with voice messages instead of text:

Enable the TTS plugin in your OpenClaw config
Configure your TTS provider (see Choosing a TTS Provider below)
Set the response mode to "voice" or "both" (sends both text and audio)

Setting Up Voice on Discord

Discord supports both voice channels and voice messages. OpenClaw can join voice channels and participate in real-time conversations.

Enable the Discord voice plugin in your OpenClaw config
Grant your Discord bot the Connect and Speak permissions in your server settings
Configure STT and TTS providers
The bot can join voice channels when invited and listen/respond to speech

Note: Discord voice requires a persistent connection, which uses slightly more resources than text-only mode.

Setting Up Voice on Web Chat

The OpenClaw web chat widget can use your browser's microphone for voice input.

Enable voice mode in the web chat configuration
When a user clicks the microphone button, the browser captures audio
The audio is sent to your STT provider for transcription
The AI response is optionally converted to audio via TTS and played back in the browser

Note: Users must grant microphone permission in their browser. HTTPS is required for microphone access.

Choosing an STT Provider

OpenClaw supports several speech-to-text providers. Here's how they compare:

Provider	Accuracy	Speed	Free Tier	Best For
OpenAI Whisper	Excellent	Good	No (pay-per-use)	Best overall accuracy
Deepgram	Very Good	Fastest	Yes ($200 credit)	Real-time transcription
Google Speech	Very Good	Good	Yes (60 min/mo)	Budget-friendly option

Configuring STT in OpenClaw

Add your STT provider to the OpenClaw config:

{
  "stt": {
    "provider": "whisper",
    "apiKey": "your-openai-api-key",
    "model": "whisper-1",
    "language": "en"
  }
}

Set the language parameter to improve accuracy if you primarily speak one language. Leave it unset for automatic language detection.

Choosing a TTS Provider

For text-to-speech, you have several options ranging from free to premium:

Provider	Voice Quality	Speed	Free Tier	Best For
OpenAI TTS	Very Natural	Good	No (pay-per-use)	Most natural-sounding
ElevenLabs	Premium	Good	Yes (10K chars/mo)	Custom voice cloning
Google TTS	Good	Fast	Yes (4M chars/mo)	Free, high-volume usage

Configuring TTS in OpenClaw

{
  "tts": {
    "provider": "openai",
    "apiKey": "your-openai-api-key",
    "model": "tts-1",
    "voice": "alloy"
  }
}

OpenAI TTS offers several voice options: alloy, echo, fable, onyx, nova, and shimmer. Try different voices to find the one that fits your assistant's personality.

Tips for a Better Voice Experience

Speak clearly and at a normal pace — STT accuracy improves significantly with clear speech
Use a good microphone — a headset or dedicated mic reduces background noise and improves transcription
Configure the language setting — if you always speak one language, set it explicitly for better accuracy
Set response length limits for TTS — long AI responses create long audio clips. Consider setting a max response length or using "both" mode (text + audio) so users can read long responses
Choose the right TTS voice — test different voices to find one that sounds natural for your use case
Use "both" response mode — sending both text and audio lets users choose how to consume the response
Consider latency — if speed matters, use Deepgram for STT (fastest) and Google TTS (fast and free)

Voice Mode on OpenClaw Launch

Voice features are available on all OpenClaw Launch plans. You can configure STT and TTS providers directly from the dashboard — no config files to edit manually.

If you're using the included AI credits, voice transcription costs are included. For TTS, you can use Google TTS (free tier) or bring your own API key for OpenAI TTS or ElevenLabs.

Get Started with Voice

Ready to add voice to your AI agent? Check out our platform-specific guides:

Set up OpenClaw on Telegram — voice messages work out of the box
Set up OpenClaw on Discord — join voice channels and chat hands-free

Or deploy on OpenClaw Launch and configure voice mode from the dashboard in minutes.