Voice Input

CopilotKit provides built-in voice transcription support, allowing users to speak their messages instead of typing. Audio is recorded in the browser, sent to your backend, and transcribed using services like OpenAI Whisper.

Overview

Voice input in CopilotKit:

Browser Recording - Captures audio using the Web Audio API
Automatic Transcription - Converts speech to text via transcription services
Seamless Integration - Transcribed text appears in the chat input automatically
Multiple Providers - Use OpenAI Whisper or implement custom transcription services

Quick Start

1. Install Dependencies

npm

npm install @copilotkit/voice openai

yarn

yarn add @copilotkit/voice openai

pnpm

pnpm add @copilotkit/voice openai

2. Configure Backend

Add the transcription service to your runtime:

import { CopilotRuntime } from "@copilotkit/runtime";
import { TranscriptionServiceOpenAI } from "@copilotkit/voice";
import OpenAI from "openai";

const runtime = new CopilotRuntime({
  agents: { 
    default: myAgent 
  },
  transcriptionService: new TranscriptionServiceOpenAI({
    openai: new OpenAI({ 
      apiKey: process.env.OPENAI_API_KEY 
    })
  })
});

3. Use in Frontend

The chat component automatically shows a microphone button when transcription is configured:

import { CopilotChat } from "@copilotkit/react";

function MyApp() {
  return (
    <CopilotKitProvider runtimeUrl="/api/copilotkit">
      <CopilotChat />
    </CopilotKitProvider>
  );
}

That’s it! Users can now click the microphone icon to record voice messages.

OpenAI Whisper Configuration

The TranscriptionServiceOpenAI class provides full control over Whisper’s behavior:

new TranscriptionServiceOpenAI({
  openai: new OpenAI({ apiKey: "..." }),  // Required: OpenAI client
  model: "whisper-1",                     // Optional: Model selection (default: "whisper-1")
  language: "en",                         // Optional: ISO-639-1 language code
  prompt: "Technical discussion context", // Optional: Context for better accuracy
  temperature: 0                          // Optional: Sampling temperature (0-1)
})

Configuration Options

Option	Type	Description
`openai`	`OpenAI`	Required. OpenAI client instance with API key
`model`	`string`	Whisper model to use. Default: `"whisper-1"`
`language`	`string`	Audio language in ISO-639-1 format (e.g., `"en"`, `"es"`, `"fr"`). Improves accuracy and latency
`prompt`	`string`	Optional text to guide transcription style or provide context. Should match audio language
`temperature`	`number`	Sampling temperature between 0 and 1. Lower = more deterministic, higher = more creative. Default: 0

Language Support

Specify the language to improve accuracy:

new TranscriptionServiceOpenAI({
  openai: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
  language: "es",  // Spanish
  prompt: "Conversación técnica sobre desarrollo de software"
})

Supported languages include:

en - English
es - Spanish
fr - French
de - German
ja - Japanese
zh - Chinese
And 50+ more languages

See OpenAI’s language support for the complete list.

Context Prompts

Use the prompt option to improve accuracy for domain-specific terminology:

// Medical transcription
new TranscriptionServiceOpenAI({
  openai: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
  language: "en",
  prompt: "Medical consultation about cardiovascular health, medications, and treatment plans."
})

// Technical discussion
new TranscriptionServiceOpenAI({
  openai: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
  language: "en",
  prompt: "Software development discussion about React, TypeScript, and API design."
})

Audio Recording

CopilotKit uses the browser’s Web Audio API to capture audio.

Supported Audio Formats

The following MIME types are supported:

audio/webm (default in most browsers)
audio/mp3 / audio/mpeg
audio/mp4
audio/wav
audio/ogg
audio/flac
audio/aac

The browser automatically selects the best available format.

Recording Visualization

CopilotKit includes a built-in audio visualizer that displays a waveform during recording:

import { CopilotChatAudioRecorder } from "@copilotkit/react";

function CustomInput() {
  return (
    <CopilotChatAudioRecorder 
      inputClass="cpk:h-11 cpk:w-full cpk:px-5"
      inputShowControls={false}
    />
  );
}

The visualizer renders a canvas with animated bars that respond to audio levels:

// Waveform configuration (internal)
const config = {
  barWidth: 2,
  minHeight: 2,
  maxHeight: 20,
  gap: 2,
  numSamples: Math.ceil(canvasWidth / (barWidth + gap))
};

Custom Styling

Style the audio recorder component:

/* Override waveform color */
[data-copilotkit] .copilot-chat-audio-recorder canvas {
  color: var(--primary);  /* Inherits text color */
}

/* Custom container styles */
.copilot-chat-audio-recorder {
  border-radius: 0.5rem;
  background: var(--background);
  padding: 0.5rem;
}

Custom Transcription Service

Implement your own transcription service for alternative providers:

import { 
  TranscriptionService,
  TranscribeFileOptions 
} from "@copilotkit/runtime";

class CustomTranscriptionService extends TranscriptionService {
  private apiKey: string;
  
  constructor(apiKey: string) {
    super();
    this.apiKey = apiKey;
  }
  
  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    const { audioFile, mimeType, size } = options;
    
    // Convert File to buffer/blob for your API
    const arrayBuffer = await audioFile.arrayBuffer();
    
    // Call your transcription API
    const response = await fetch("https://your-api.com/transcribe", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${this.apiKey}`,
        "Content-Type": mimeType
      },
      body: arrayBuffer
    });
    
    if (!response.ok) {
      throw new Error(`Transcription failed: ${response.statusText}`);
    }
    
    const result = await response.json();
    return result.text;
  }
}

// Use custom service
const runtime = new CopilotRuntime({
  agents: { default: myAgent },
  transcriptionService: new CustomTranscriptionService(
    process.env.CUSTOM_API_KEY
  )
});

TranscribeFileOptions

The transcribeFile method receives:

interface TranscribeFileOptions {
  audioFile: File;    // Audio file from the browser
  mimeType: string;   // MIME type (e.g., "audio/webm")
  size: number;       // File size in bytes
}

Request Handling

CopilotKit handles transcription requests in two modes:

REST Mode (Multipart Form Data)

POST /api/copilotkit/transcribe
Content-Type: multipart/form-data

audio: [File]

Single Endpoint Mode (JSON with Base64)

POST /api/copilotkit
Content-Type: application/json

{
  "audio": "base64-encoded-audio-data",
  "mimeType": "audio/webm",
  "filename": "recording.webm"
}

Both modes are handled automatically by the runtime.

Error Handling

CopilotKit provides detailed error responses:

Error Code	HTTP Status	Description
`SERVICE_NOT_CONFIGURED`	503	No transcription service configured
`INVALID_AUDIO_FORMAT`	400	Unsupported audio format
`AUDIO_TOO_LONG`	400	Audio file exceeds duration limit
`AUDIO_TOO_SHORT`	400	Audio file too short to transcribe
`RATE_LIMITED`	429	Too many transcription requests
`AUTH_FAILED`	401	Invalid API credentials
`PROVIDER_ERROR`	500	Transcription service error
`NETWORK_ERROR`	502	Network connectivity issue
`INVALID_REQUEST`	400	Malformed request

Custom Error Handling

Implement error handling in your custom service:

class CustomTranscriptionService extends TranscriptionService {
  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    try {
      // Validate file size
      if (options.size > 25 * 1024 * 1024) {
        throw new Error("Audio file too large (max 25MB)");
      }
      
      // Validate format
      if (!options.mimeType.startsWith("audio/")) {
        throw new Error("Invalid audio format");
      }
      
      // Transcribe
      const text = await this.callAPI(options);
      
      // Validate result
      if (!text || text.trim().length === 0) {
        throw new Error("Transcription returned empty result");
      }
      
      return text;
    } catch (error) {
      console.error("Transcription error:", error);
      throw error;
    }
  }
}

Advanced Configuration

Rate Limiting

Implement rate limiting for transcription requests:

import { BeforeRequestMiddlewareFn } from "@copilotkit/runtime";

const transcriptionRateLimiter = new Map<string, number[]>();

const rateLimitMiddleware: BeforeRequestMiddlewareFn = async ({ 
  request, 
  path 
}) => {
  if (path !== "/api/copilotkit/transcribe") return request;
  
  const userId = request.headers.get("X-User-Id") || "anonymous";
  const now = Date.now();
  const userRequests = transcriptionRateLimiter.get(userId) || [];
  
  // Remove requests older than 1 minute
  const recentRequests = userRequests.filter(t => now - t < 60000);
  
  // Allow max 10 transcription requests per minute
  if (recentRequests.length >= 10) {
    throw new Response("Rate limit exceeded for transcription", { 
      status: 429 
    });
  }
  
  recentRequests.push(now);
  transcriptionRateLimiter.set(userId, recentRequests);
  
  return request;
};

const runtime = new CopilotRuntime({
  agents: { default: myAgent },
  transcriptionService: transcriptionService,
  beforeRequestMiddleware: rateLimitMiddleware
});

Caching Transcriptions

Cache transcriptions to reduce API costs:

class CachedTranscriptionService extends TranscriptionService {
  private cache = new Map<string, string>();
  private baseService: TranscriptionService;
  
  constructor(baseService: TranscriptionService) {
    super();
    this.baseService = baseService;
  }
  
  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    // Generate cache key from audio file hash
    const arrayBuffer = await options.audioFile.arrayBuffer();
    const hashBuffer = await crypto.subtle.digest("SHA-256", arrayBuffer);
    const hashArray = Array.from(new Uint8Array(hashBuffer));
    const cacheKey = hashArray.map(b => b.toString(16).padStart(2, "0")).join("");
    
    // Check cache
    const cached = this.cache.get(cacheKey);
    if (cached) {
      console.log("Returning cached transcription");
      return cached;
    }
    
    // Transcribe and cache
    const text = await this.baseService.transcribeFile(options);
    this.cache.set(cacheKey, text);
    
    return text;
  }
}

// Use cached service
const runtime = new CopilotRuntime({
  agents: { default: myAgent },
  transcriptionService: new CachedTranscriptionService(
    new TranscriptionServiceOpenAI({ 
      openai: new OpenAI({ apiKey: process.env.OPENAI_API_KEY }) 
    })
  )
});

Logging and Analytics

Track transcription usage:

class AnalyticsTranscriptionService extends TranscriptionService {
  private baseService: TranscriptionService;
  
  constructor(baseService: TranscriptionService) {
    super();
    this.baseService = baseService;
  }
  
  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    const startTime = Date.now();
    
    try {
      const text = await this.baseService.transcribeFile(options);
      
      // Log success metrics
      await analytics.track({
        event: "transcription_completed",
        properties: {
          duration: Date.now() - startTime,
          audioSize: options.size,
          audioFormat: options.mimeType,
          transcriptionLength: text.length,
          success: true
        }
      });
      
      return text;
    } catch (error) {
      // Log failure metrics
      await analytics.track({
        event: "transcription_failed",
        properties: {
          duration: Date.now() - startTime,
          audioSize: options.size,
          audioFormat: options.mimeType,
          error: error.message,
          success: false
        }
      });
      
      throw error;
    }
  }
}

Best Practices

Specify Language

Always specify the language parameter when you know the audio language - it significantly improves accuracy

Provide Context

Use the prompt parameter for domain-specific vocabulary and better transcription quality

Handle Errors Gracefully

Implement proper error handling and provide user-friendly error messages

Rate Limit Requests

Implement rate limiting to prevent abuse and control API costs

Testing Voice Input

Test transcription in development:

// Mock transcription service for testing
class MockTranscriptionService extends TranscriptionService {
  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    // Return mock transcription
    return "This is a mock transcription for testing purposes.";
  }
}

// Use in development
const runtime = new CopilotRuntime({
  agents: { default: myAgent },
  transcriptionService: process.env.NODE_ENV === "production"
    ? new TranscriptionServiceOpenAI({ openai })
    : new MockTranscriptionService()
});

OpenAI Whisper API

Official OpenAI Whisper documentation

Runtime Middleware

Implement authentication and rate limiting for transcription

​Voice Input

​Overview

​Quick Start

​1. Install Dependencies

​2. Configure Backend

​3. Use in Frontend

​OpenAI Whisper Configuration

​Configuration Options

​Language Support

​Context Prompts

​Audio Recording

​Supported Audio Formats

​Recording Visualization

​Custom Styling

​Custom Transcription Service

​TranscribeFileOptions

​Request Handling

​REST Mode (Multipart Form Data)

​Single Endpoint Mode (JSON with Base64)

​Error Handling

​Custom Error Handling

​Advanced Configuration

​Rate Limiting

​Caching Transcriptions

​Logging and Analytics

​Best Practices