What if you could talk to your AI assistant instead of typing? With OpenClaw's voice mode, you can. Send voice messages, get spoken replies, and have hands-free AI conversations on Telegram, Discord, and web chat.
This guide walks you through setting up speech-to-text (STT) and text-to-speech (TTS) on OpenClaw, choosing the right providers, and getting the best voice experience.
What Is OpenClaw Voice Mode?
OpenClaw voice mode adds two capabilities to your AI agent:
- Speech-to-Text (STT): Converts your voice messages into text so the AI can process them
- Text-to-Speech (TTS): Converts the AI's text responses into spoken audio so you can listen instead of reading
Voice mode works on Telegram (voice messages), Discord (voice channels), and the web chat widget (browser microphone). You can use STT alone, TTS alone, or both together for a fully hands-free experience.
How Voice Mode Works
The voice pipeline is straightforward:
- You speak — send a voice message on Telegram, join a Discord voice channel, or click the mic button in web chat
- STT converts your speech to text — your chosen STT provider transcribes the audio
- The AI processes the text — your AI agent generates a response, just like a normal text message
- TTS converts the response to audio — your chosen TTS provider generates spoken audio
- You hear the reply — the audio is sent back through the same channel
The entire round-trip typically takes 3-8 seconds, depending on your STT/TTS providers and the length of the response.
Setting Up Voice on Telegram
Telegram has built-in voice message support, making it the easiest platform for OpenClaw voice mode.
Speech-to-Text (Receiving Voice Messages)
Telegram automatically sends voice messages as audio files to your bot. OpenClaw can transcribe these using your configured STT provider.
- In your OpenClaw config, enable the STT plugin
- Configure your STT provider (see Choosing an STT Provider below)
- Send a voice message to your bot — it will be automatically transcribed and processed
Text-to-Speech (Sending Voice Replies)
To have your bot reply with voice messages instead of text:
- Enable the TTS plugin in your OpenClaw config
- Configure your TTS provider (see Choosing a TTS Provider below)
- Set the response mode to
"voice"or"both"(sends both text and audio)
Setting Up Voice on Discord
Discord supports both voice channels and voice messages. OpenClaw can join voice channels and participate in real-time conversations.
- Enable the Discord voice plugin in your OpenClaw config
- Grant your Discord bot the Connect and Speak permissions in your server settings
- Configure STT and TTS providers
- The bot can join voice channels when invited and listen/respond to speech
Note: Discord voice requires a persistent connection, which uses slightly more resources than text-only mode.
Setting Up Voice on Web Chat
The OpenClaw web chat widget can use your browser's microphone for voice input.
- Enable voice mode in the web chat configuration
- When a user clicks the microphone button, the browser captures audio
- The audio is sent to your STT provider for transcription
- The AI response is optionally converted to audio via TTS and played back in the browser
Note: Users must grant microphone permission in their browser. HTTPS is required for microphone access.
Choosing an STT Provider
OpenClaw supports several speech-to-text providers. Here's how they compare:
| Provider | Accuracy | Speed | Free Tier | Best For |
|---|---|---|---|---|
| OpenAI Whisper | Excellent | Good | No (pay-per-use) | Best overall accuracy |
| Deepgram | Very Good | Fastest | Yes ($200 credit) | Real-time transcription |
| Google Speech | Very Good | Good | Yes (60 min/mo) | Budget-friendly option |
Configuring STT in OpenClaw
Add your STT provider to the OpenClaw config:
{
"stt": {
"provider": "whisper",
"apiKey": "your-openai-api-key",
"model": "whisper-1",
"language": "en"
}
}
Set the language parameter to improve accuracy if you primarily speak one language. Leave it unset for automatic language detection.
Choosing a TTS Provider
For text-to-speech, you have several options ranging from free to premium:
| Provider | Voice Quality | Speed | Free Tier | Best For |
|---|---|---|---|---|
| OpenAI TTS | Very Natural | Good | No (pay-per-use) | Most natural-sounding |
| ElevenLabs | Premium | Good | Yes (10K chars/mo) | Custom voice cloning |
| Google TTS | Good | Fast | Yes (4M chars/mo) | Free, high-volume usage |
Configuring TTS in OpenClaw
{
"tts": {
"provider": "openai",
"apiKey": "your-openai-api-key",
"model": "tts-1",
"voice": "alloy"
}
}
OpenAI TTS offers several voice options: alloy, echo, fable, onyx, nova, and shimmer. Try different voices to find the one that fits your assistant's personality.
Tips for a Better Voice Experience
- Speak clearly and at a normal pace — STT accuracy improves significantly with clear speech
- Use a good microphone — a headset or dedicated mic reduces background noise and improves transcription
- Configure the language setting — if you always speak one language, set it explicitly for better accuracy
- Set response length limits for TTS — long AI responses create long audio clips. Consider setting a max response length or using
"both"mode (text + audio) so users can read long responses - Choose the right TTS voice — test different voices to find one that sounds natural for your use case
- Use
"both"response mode — sending both text and audio lets users choose how to consume the response - Consider latency — if speed matters, use Deepgram for STT (fastest) and Google TTS (fast and free)
Voice Mode on OpenClaw Launch
Voice features are available on all OpenClaw Launch plans. You can configure STT and TTS providers directly from the dashboard — no config files to edit manually.
If you're using the included AI credits, voice transcription costs are included. For TTS, you can use Google TTS (free tier) or bring your own API key for OpenAI TTS or ElevenLabs.
Get Started with Voice
Ready to add voice to your AI agent? Check out our platform-specific guides:
- Set up OpenClaw on Telegram — voice messages work out of the box
- Set up OpenClaw on Discord — join voice channels and chat hands-free
Or deploy on OpenClaw Launch and configure voice mode from the dashboard in minutes.