Voice Input
CopilotKit provides built-in voice transcription support, allowing users to speak their messages instead of typing. Audio is recorded in the browser, sent to your backend, and transcribed using services like OpenAI Whisper.Overview
Voice input in CopilotKit:- Browser Recording - Captures audio using the Web Audio API
- Automatic Transcription - Converts speech to text via transcription services
- Seamless Integration - Transcribed text appears in the chat input automatically
- Multiple Providers - Use OpenAI Whisper or implement custom transcription services
Quick Start
1. Install Dependencies
npm
yarn
pnpm
2. Configure Backend
Add the transcription service to your runtime:3. Use in Frontend
The chat component automatically shows a microphone button when transcription is configured:OpenAI Whisper Configuration
TheTranscriptionServiceOpenAI class provides full control over Whisper’s behavior:
Configuration Options
| Option | Type | Description |
|---|---|---|
openai | OpenAI | Required. OpenAI client instance with API key |
model | string | Whisper model to use. Default: "whisper-1" |
language | string | Audio language in ISO-639-1 format (e.g., "en", "es", "fr"). Improves accuracy and latency |
prompt | string | Optional text to guide transcription style or provide context. Should match audio language |
temperature | number | Sampling temperature between 0 and 1. Lower = more deterministic, higher = more creative. Default: 0 |
Language Support
Specify the language to improve accuracy:en- Englishes- Spanishfr- Frenchde- Germanja- Japanesezh- Chinese- And 50+ more languages
Context Prompts
Use theprompt option to improve accuracy for domain-specific terminology:
Audio Recording
CopilotKit uses the browser’s Web Audio API to capture audio.Supported Audio Formats
The following MIME types are supported:audio/webm(default in most browsers)audio/mp3/audio/mpegaudio/mp4audio/wavaudio/oggaudio/flacaudio/aac
Recording Visualization
CopilotKit includes a built-in audio visualizer that displays a waveform during recording:Custom Styling
Style the audio recorder component:Custom Transcription Service
Implement your own transcription service for alternative providers:TranscribeFileOptions
ThetranscribeFile method receives:
Request Handling
CopilotKit handles transcription requests in two modes:REST Mode (Multipart Form Data)
Single Endpoint Mode (JSON with Base64)
Error Handling
CopilotKit provides detailed error responses:| Error Code | HTTP Status | Description |
|---|---|---|
SERVICE_NOT_CONFIGURED | 503 | No transcription service configured |
INVALID_AUDIO_FORMAT | 400 | Unsupported audio format |
AUDIO_TOO_LONG | 400 | Audio file exceeds duration limit |
AUDIO_TOO_SHORT | 400 | Audio file too short to transcribe |
RATE_LIMITED | 429 | Too many transcription requests |
AUTH_FAILED | 401 | Invalid API credentials |
PROVIDER_ERROR | 500 | Transcription service error |
NETWORK_ERROR | 502 | Network connectivity issue |
INVALID_REQUEST | 400 | Malformed request |
Custom Error Handling
Implement error handling in your custom service:Advanced Configuration
Rate Limiting
Implement rate limiting for transcription requests:Caching Transcriptions
Cache transcriptions to reduce API costs:Logging and Analytics
Track transcription usage:Best Practices
Specify Language
Always specify the
language parameter when you know the audio language - it significantly improves accuracyProvide Context
Use the
prompt parameter for domain-specific vocabulary and better transcription qualityHandle Errors Gracefully
Implement proper error handling and provide user-friendly error messages
Rate Limit Requests
Implement rate limiting to prevent abuse and control API costs
Testing Voice Input
Test transcription in development:Related Resources
OpenAI Whisper API
Official OpenAI Whisper documentation
Runtime Middleware
Implement authentication and rate limiting for transcription
