What is CopilotRuntime?
CopilotRuntime is the server-side orchestrator that receives HTTP requests from the frontend and delegates them to agents for execution. It’s the bridge between your frontend application and your AI agents.
CopilotRuntime can be deployed as a standalone microservice or embedded in your existing Node.js server.
Core Concepts
CopilotRuntime
Basic Setup
import { CopilotRuntime } from "@copilotkitnext/runtime" ;
import { BuiltInAgent } from "@copilotkitnext/agent" ;
import { openai } from "@ai-sdk/openai" ;
const runtime = new CopilotRuntime ({
agents: {
"assistant" : new BuiltInAgent ({
agentId: "assistant" ,
model: openai ( "gpt-4" )
})
}
});
Reference : packages/v2/runtime/src/runtime.ts:57
Configuration Options
interface CopilotRuntimeOptions {
// Map of available agents
agents : MaybePromise < Record < string , AbstractAgent >>;
// Agent runner (defaults to InMemoryAgentRunner)
runner ?: AgentRunner ;
// Transcription service for audio
transcriptionService ?: TranscriptionService ;
// Before request middleware
beforeRequestMiddleware ?: BeforeRequestMiddleware ;
// After request middleware
afterRequestMiddleware ?: AfterRequestMiddleware ;
// A2UI middleware config
a2ui ?: A2UIMiddlewareConfig ;
// MCP Apps config
mcpApps ?: McpAppsConfig ;
}
Reference : packages/v2/runtime/src/runtime.ts:41
Lazy Agent Loading
Agents can be loaded asynchronously for better startup performance:
const runtime = new CopilotRuntime ({
agents : async () => {
// Load agents on-demand
const model = await loadModel ();
return {
"assistant" : new BuiltInAgent ({
agentId: "assistant" ,
model
})
};
}
});
Lazy loading is useful when agents have expensive initialization (loading models, connecting to databases, etc.).
Server Integration
Express Adapter
import express from "express" ;
import { copilotRuntimeExpressAdapter } from "@copilotkitnext/runtime" ;
const app = express ();
app . use ( "/copilotkit" , copilotRuntimeExpressAdapter ({
runtime
}));
app . listen ( 4000 );
This creates the following endpoints:
GET /copilotkit/info - Runtime and agent information
POST /copilotkit/agent/:agentId/run - Execute an agent
POST /copilotkit/agent/:agentId/connect - Reconnect to thread
POST /copilotkit/agent/:agentId/stop/:threadId - Stop agent
Hono Adapter
import { Hono } from "hono" ;
import { copilotRuntimeHonoAdapter } from "@copilotkitnext/runtime" ;
const app = new Hono ();
app . route ( "/copilotkit" , copilotRuntimeHonoAdapter ({
runtime
}));
export default app ;
Next.js API Route
// app/api/copilotkit/[...copilotkit]/route.ts
import { CopilotRuntime } from "@copilotkitnext/runtime" ;
import { copilotRuntimeNextJSAppRouterAdapter } from "@copilotkitnext/runtime" ;
const runtime = new CopilotRuntime ({
agents: { /* ... */ }
});
export const POST = copilotRuntimeNextJSAppRouterAdapter ( runtime );
export const GET = copilotRuntimeNextJSAppRouterAdapter ( runtime );
AgentRunner
AgentRunner is an abstract class responsible for managing thread state (conversation history, agent state) and executing agents . It’s the persistence layer for agent conversations.
abstract class AgentRunner {
// Execute an agent run
abstract run ( request : AgentRunnerRunRequest ) : Observable < BaseEvent >;
// Reconnect to an existing thread
abstract connect ( request : AgentRunnerConnectRequest ) : Observable < BaseEvent >;
// Check if a thread is currently running
abstract isRunning ( request : AgentRunnerIsRunningRequest ) : Promise < boolean >;
// Stop a running thread
abstract stop ( request : AgentRunnerStopRequest ) : Promise < boolean | undefined >;
}
Reference : packages/v2/runtime/src/runner/agent-runner.ts:23
InMemoryAgentRunner
The default runner that stores thread state in memory. Perfect for development and stateless deployments:
import { InMemoryAgentRunner } from "@copilotkitnext/runtime" ;
const runner = new InMemoryAgentRunner ();
const runtime = new CopilotRuntime ({
agents: { /* ... */ },
runner
});
Characteristics:
Ephemeral - State is lost on server restart
Fast - No I/O overhead
Hot-reload friendly - Survives hot reloads in development (via global state)
Concurrent - Handles multiple threads simultaneously
Reference : packages/v2/runtime/src/runner/in-memory.ts:100
State Management
// Global store per thread
const store = {
threadId: "thread_123" ,
subject: ReplaySubject < BaseEvent > ,
isRunning: false ,
currentRunId: "run_456" ,
historicRuns: [
{
threadId: "thread_123" ,
runId: "run_456" ,
parentRunId: null ,
events: [ ... ],
createdAt: 1234567890
}
],
agent: AbstractAgent ,
stopRequested: false
};
Reference : packages/v2/runtime/src/runner/in-memory.ts:19
Event Replay
When reconnecting, InMemoryAgentRunner replays all historic events:
connect ( request : AgentRunnerConnectRequest ): Observable < BaseEvent > {
const store = GLOBAL_STORE . get ( request . threadId );
// Collect all historic events
const allHistoricEvents: BaseEvent [] = [];
for ( const run of store . historicRuns ) {
allHistoricEvents.push( ... run . events );
}
// Compact and emit
const compactedEvents = compactEvents ( allHistoricEvents );
for ( const event of compactedEvents ) {
connectionSubject.next(event);
}
// Bridge to active run if exists
if ( store . subject && store . isRunning ) {
store.subject.subscribe(connectionSubject);
}
return connectionSubject . asObservable ();
}
Reference : packages/v2/runtime/src/runner/in-memory.ts:294
SQLiteAgentRunner
Persistent runner that stores thread state in SQLite. Use in production for conversation persistence:
import { SQLiteAgentRunner } from "@copilotkitnext/sqlite-runner" ;
const runner = new SQLiteAgentRunner ({
dbPath: "./copilot.db"
});
const runtime = new CopilotRuntime ({
agents: { /* ... */ },
runner
});
Characteristics:
Persistent - Survives server restarts
Scalable - Can handle large conversation histories
Queryable - SQL access to conversation data
Transactional - ACID guarantees for state updates
Reference : packages/v2/sqlite-runner/src/sqlite-runner.ts
Schema
CREATE TABLE runs (
id TEXT PRIMARY KEY ,
thread_id TEXT NOT NULL ,
parent_run_id TEXT ,
created_at INTEGER NOT NULL
);
CREATE TABLE events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL ,
event_type TEXT NOT NULL ,
event_data TEXT NOT NULL ,
created_at INTEGER NOT NULL ,
FOREIGN KEY (run_id) REFERENCES runs(id)
);
CREATE INDEX idx_events_run_id ON events(run_id);
CREATE INDEX idx_runs_thread_id ON runs(thread_id);
Event Persistence
// Store events as they arrive
for await ( const event of agentStream ) {
db . insert ( "events" , {
run_id: runId ,
event_type: event . type ,
event_data: JSON . stringify ( event ),
created_at: Date . now ()
});
yield event ; // Stream to frontend
}
Reference : packages/v2/sqlite-runner/src/sqlite-runner.ts:248
Custom AgentRunner
Build a custom runner for your storage backend:
import { AgentRunner } from "@copilotkitnext/runtime" ;
import { Observable } from "rxjs" ;
class RedisAgentRunner extends AgentRunner {
constructor ( private redis : RedisClient ) {
super ();
}
run ( request : AgentRunnerRunRequest ) : Observable < BaseEvent > {
return new Observable (( observer ) => {
const { threadId , agent , input } = request ;
// Load historic state from Redis
const history = await this . redis . get ( `thread: ${ threadId } ` );
// Execute agent
agent . runAgent ( input , {
onEvent : ({ event }) => {
// Persist event
this . redis . rpush ( `thread: ${ threadId } :events` ,
JSON . stringify ( event )
);
// Stream to frontend
observer . next ( event );
}
});
return () => {
// Cleanup
};
});
}
async connect ( request : AgentRunnerConnectRequest ) : Observable < BaseEvent > {
// Load and replay events from Redis
const events = await this . redis . lrange (
`thread: ${ request . threadId } :events` ,
0 ,
- 1
);
return new Observable (( observer ) => {
for ( const eventStr of events ) {
observer . next ( JSON . parse ( eventStr ));
}
observer . complete ();
});
}
async isRunning ( request : AgentRunnerIsRunningRequest ) : Promise < boolean > {
return await this . redis . exists ( `thread: ${ request . threadId } :running` );
}
async stop ( request : AgentRunnerStopRequest ) : Promise < boolean > {
const running = await this . isRunning ( request );
if ( ! running ) return false ;
await this . redis . del ( `thread: ${ request . threadId } :running` );
return true ;
}
}
Custom runners must handle concurrent access safely. Use locks or transactions to prevent race conditions.
Middleware
Middleware provides hooks for cross-cutting concerns like authentication, logging, and request transformation.
Before Request Middleware
Runs before the request handler:
const runtime = new CopilotRuntime ({
agents: { /* ... */ },
beforeRequestMiddleware : async ({ runtime , request , path }) => {
// Authentication
const token = request . headers . get ( "Authorization" );
if ( ! isValid ( token )) {
throw new Error ( "Unauthorized" );
}
// Logging
console . log ( `Request to ${ path } from ${ token } ` );
// Transform request
const newHeaders = new Headers ( request . headers );
newHeaders . set ( "X-User-ID" , getUserId ( token ));
return new Request ( request . url , {
... request ,
headers: newHeaders
});
}
});
Reference : packages/v2/runtime/src/middleware.ts:72
After Request Middleware
Runs after the response is generated:
const runtime = new CopilotRuntime ({
agents: { /* ... */ },
afterRequestMiddleware : async ({
runtime ,
response ,
path ,
messages ,
threadId ,
runId
}) => {
// Log completion
console . log ( `Completed ${ path } for thread ${ threadId } ` );
// Analytics
await analytics . track ({
event: "agent_run_completed" ,
threadId ,
runId ,
messageCount: messages ?. length
});
// Audit trail
await audit . log ({
action: "agent_run" ,
threadId ,
timestamp: Date . now ()
});
}
});
Reference : packages/v2/runtime/src/middleware.ts:89
Middleware Use Cases
Common middleware patterns:
Authentication - Verify JWT tokens, API keys, or session cookies
Authorization - Check user permissions for specific agents
Rate limiting - Throttle requests per user or IP
Logging - Record all agent interactions
Metrics - Track performance and usage
Request transformation - Modify headers or payloads
Response filtering - Remove sensitive data from responses
Thread Management
Thread IDs
Threads represent a conversation context. The frontend generates and manages thread IDs:
import { v4 as uuidv4 } from "uuid" ;
// Create new thread
const threadId = uuidv4 ();
// Run agent in this thread
await agent . runAgent ({
threadId ,
runId: uuidv4 (),
messages: [ ... ]
});
Run IDs
Each agent execution within a thread gets a unique run ID:
// First run
await agent . runAgent ({
threadId: "thread_123" ,
runId: "run_1" ,
messages: [{ role: "user" , content: "Hello" }]
});
// Follow-up run in same thread
await agent . runAgent ({
threadId: "thread_123" ,
runId: "run_2" ,
messages: [{ role: "user" , content: "Tell me more" }]
});
Parent-Child Runs
Runners track parent-child relationships for run chains:
{
threadId : "thread_123" ,
runs : [
{
runId: "run_1" ,
parentRunId: null , // First run
events: [ ... ]
},
{
runId: "run_2" ,
parentRunId: "run_1" , // Child of run_1
events: [ ... ]
}
]
}
Reference : packages/v2/runtime/src/runner/in-memory.ts:19
State Persistence
Event Compaction
To optimize storage, runners can compact event streams:
// Original events
[
{ type: "TEXT_MESSAGE_START" , messageId: "msg_1" },
{ type: "TEXT_MESSAGE_CONTENT" , messageId: "msg_1" , content: "Hello" },
{ type: "TEXT_MESSAGE_CONTENT" , messageId: "msg_1" , content: " world" },
{ type: "TEXT_MESSAGE_END" , messageId: "msg_1" }
]
// Compacted
[
{
type: "TEXT_MESSAGE_START" ,
messageId: "msg_1" ,
content: "Hello world" // Merged content
}
]
Reference : packages/v2/runtime/src/runner/in-memory.ts:213
Deduplication
Duplicate events are removed during compaction:
// Before
[
{ type: "RUN_STARTED" , runId: "run_1" },
{ type: "RUN_STARTED" , runId: "run_1" }, // Duplicate
{ type: "TEXT_MESSAGE_CONTENT" , content: "Hello" }
]
// After
[
{ type: "RUN_STARTED" , runId: "run_1" },
{ type: "TEXT_MESSAGE_CONTENT" , content: "Hello" }
]
Concurrent Execution
Thread Isolation
Each thread runs independently:
// Thread 1
runner . run ({
threadId: "thread_1" ,
agent: agent . clone (),
input: { ... }
});
// Thread 2 (runs concurrently)
runner . run ({
threadId: "thread_2" ,
agent: agent . clone (),
input: { ... }
});
Agent Cloning
The runtime clones agents for each request to ensure isolation:
// Original agent
const agent = new BuiltInAgent ({ ... });
// Cloned for request 1
const clone1 = agent . clone ();
// Cloned for request 2
const clone2 = agent . clone ();
// No shared state between clones
This prevents race conditions and state leakage between concurrent requests.
Preventing Concurrent Runs in Same Thread
run ( request : AgentRunnerRunRequest ): Observable < BaseEvent > {
const store = GLOBAL_STORE . get ( request . threadId );
if (store.isRunning) {
throw new Error ( "Thread already running" );
}
store. isRunning = true ;
// ... execute agent
}
Reference : packages/v2/runtime/src/runner/in-memory.ts:109
Error Handling
Run Finalization
Runners ensure runs are properly finalized even on errors:
try {
await agent . runAgent ( input , { onEvent });
// Success - emit RUN_FINISHED
const appendedEvents = finalizeRunEvents ( currentRunEvents , {
stopRequested: false
});
} catch ( error ) {
// Error - emit RUN_ERROR
const appendedEvents = finalizeRunEvents ( currentRunEvents , {
stopRequested: false ,
interruptionMessage: error . message
});
} finally {
// Always clean up
store . isRunning = false ;
store . currentRunId = null ;
}
Reference : packages/v2/runtime/src/runner/in-memory.ts:202
Stop Requests
Gracefully stop a running agent:
await runner . stop ({
threadId: "thread_123"
});
// Runner marks stop requested
// Agent receives abort signal
// RUN_FINISHED emitted with stopped flag
Reference : packages/v2/runtime/src/runner/in-memory.ts:352
Production Deployment
Scaling Considerations
Horizontal Scaling with InMemoryAgentRunner:
State is per-process
Use sticky sessions to route threads to same instance
Or use SQLiteAgentRunner for shared state
Horizontal Scaling with SQLiteAgentRunner:
SQLite doesn’t support concurrent writes from multiple processes
Use PostgreSQL/MySQL runner instead (custom implementation)
Or use Redis runner for distributed state
Health Checks
app . get ( "/health" , async ( req , res ) => {
try {
// Check runner health
const isHealthy = await runner . isRunning ({
threadId: "health_check"
});
res . json ({ status: "ok" });
} catch ( error ) {
res . status ( 500 ). json ({
status: "error" ,
message: error . message
});
}
});
Monitoring
const runtime = new CopilotRuntime ({
agents: { /* ... */ },
afterRequestMiddleware : async ({ threadId , runId , messages }) => {
// Track metrics
metrics . histogram ( "agent_run_duration" , Date . now () - startTime );
metrics . increment ( "agent_runs_total" );
metrics . gauge ( "active_threads" , activeThreadCount );
// Error tracking
if ( hasError ) {
errorTracker . captureException ( error , {
threadId ,
runId
});
}
}
});
Best Practices
Runner Selection
Development - Use InMemoryAgentRunner for fast iteration
Production (stateless) - Use InMemoryAgentRunner with sticky sessions
Production (stateful) - Use SQLiteAgentRunner or custom persistent runner
State Management
Keep state minimal - Only persist what’s necessary
Compact regularly - Reduce storage overhead
Archive old threads - Move inactive threads to cold storage
Clean up on errors - Always finalize runs properly
Clone agents efficiently - Avoid expensive operations in clone()
Stream events promptly - Don’t buffer unnecessarily
Use connection pooling - For database-backed runners
Monitor memory - Track runner memory usage
Next Steps
Architecture Understand where runtime fits in the architecture
Agents Learn how agents are executed by the runtime
AG-UI Protocol Understand the event streaming protocol
Frontend Integration Connect your frontend to the runtime