Skip to main content

What is CopilotRuntime?

CopilotRuntime is the server-side orchestrator that receives HTTP requests from the frontend and delegates them to agents for execution. It’s the bridge between your frontend application and your AI agents.
CopilotRuntime can be deployed as a standalone microservice or embedded in your existing Node.js server.

Core Concepts

CopilotRuntime

Basic Setup

import { CopilotRuntime } from "@copilotkitnext/runtime";
import { BuiltInAgent } from "@copilotkitnext/agent";
import { openai } from "@ai-sdk/openai";

const runtime = new CopilotRuntime({
  agents: {
    "assistant": new BuiltInAgent({
      agentId: "assistant",
      model: openai("gpt-4")
    })
  }
});
Reference: packages/v2/runtime/src/runtime.ts:57

Configuration Options

interface CopilotRuntimeOptions {
  // Map of available agents
  agents: MaybePromise<Record<string, AbstractAgent>>;
  
  // Agent runner (defaults to InMemoryAgentRunner)
  runner?: AgentRunner;
  
  // Transcription service for audio
  transcriptionService?: TranscriptionService;
  
  // Before request middleware
  beforeRequestMiddleware?: BeforeRequestMiddleware;
  
  // After request middleware
  afterRequestMiddleware?: AfterRequestMiddleware;
  
  // A2UI middleware config
  a2ui?: A2UIMiddlewareConfig;
  
  // MCP Apps config
  mcpApps?: McpAppsConfig;
}
Reference: packages/v2/runtime/src/runtime.ts:41

Lazy Agent Loading

Agents can be loaded asynchronously for better startup performance:
const runtime = new CopilotRuntime({
  agents: async () => {
    // Load agents on-demand
    const model = await loadModel();
    return {
      "assistant": new BuiltInAgent({
        agentId: "assistant",
        model
      })
    };
  }
});
Lazy loading is useful when agents have expensive initialization (loading models, connecting to databases, etc.).

Server Integration

Express Adapter

import express from "express";
import { copilotRuntimeExpressAdapter } from "@copilotkitnext/runtime";

const app = express();

app.use("/copilotkit", copilotRuntimeExpressAdapter({
  runtime
}));

app.listen(4000);
This creates the following endpoints:
  • GET /copilotkit/info - Runtime and agent information
  • POST /copilotkit/agent/:agentId/run - Execute an agent
  • POST /copilotkit/agent/:agentId/connect - Reconnect to thread
  • POST /copilotkit/agent/:agentId/stop/:threadId - Stop agent

Hono Adapter

import { Hono } from "hono";
import { copilotRuntimeHonoAdapter } from "@copilotkitnext/runtime";

const app = new Hono();

app.route("/copilotkit", copilotRuntimeHonoAdapter({
  runtime
}));

export default app;

Next.js API Route

// app/api/copilotkit/[...copilotkit]/route.ts
import { CopilotRuntime } from "@copilotkitnext/runtime";
import { copilotRuntimeNextJSAppRouterAdapter } from "@copilotkitnext/runtime";

const runtime = new CopilotRuntime({
  agents: { /* ... */ }
});

export const POST = copilotRuntimeNextJSAppRouterAdapter(runtime);
export const GET = copilotRuntimeNextJSAppRouterAdapter(runtime);

AgentRunner

AgentRunner is an abstract class responsible for managing thread state (conversation history, agent state) and executing agents. It’s the persistence layer for agent conversations.
abstract class AgentRunner {
  // Execute an agent run
  abstract run(request: AgentRunnerRunRequest): Observable<BaseEvent>;
  
  // Reconnect to an existing thread
  abstract connect(request: AgentRunnerConnectRequest): Observable<BaseEvent>;
  
  // Check if a thread is currently running
  abstract isRunning(request: AgentRunnerIsRunningRequest): Promise<boolean>;
  
  // Stop a running thread
  abstract stop(request: AgentRunnerStopRequest): Promise<boolean | undefined>;
}
Reference: packages/v2/runtime/src/runner/agent-runner.ts:23

InMemoryAgentRunner

The default runner that stores thread state in memory. Perfect for development and stateless deployments:
import { InMemoryAgentRunner } from "@copilotkitnext/runtime";

const runner = new InMemoryAgentRunner();

const runtime = new CopilotRuntime({
  agents: { /* ... */ },
  runner
});
Characteristics:
  • Ephemeral - State is lost on server restart
  • Fast - No I/O overhead
  • Hot-reload friendly - Survives hot reloads in development (via global state)
  • Concurrent - Handles multiple threads simultaneously
Reference: packages/v2/runtime/src/runner/in-memory.ts:100

State Management

// Global store per thread
const store = {
  threadId: "thread_123",
  subject: ReplaySubject<BaseEvent>,
  isRunning: false,
  currentRunId: "run_456",
  historicRuns: [
    {
      threadId: "thread_123",
      runId: "run_456",
      parentRunId: null,
      events: [...],
      createdAt: 1234567890
    }
  ],
  agent: AbstractAgent,
  stopRequested: false
};
Reference: packages/v2/runtime/src/runner/in-memory.ts:19

Event Replay

When reconnecting, InMemoryAgentRunner replays all historic events:
connect(request: AgentRunnerConnectRequest): Observable<BaseEvent> {
  const store = GLOBAL_STORE.get(request.threadId);
  
  // Collect all historic events
  const allHistoricEvents: BaseEvent[] = [];
  for (const run of store.historicRuns) {
    allHistoricEvents.push(...run.events);
  }
  
  // Compact and emit
  const compactedEvents = compactEvents(allHistoricEvents);
  for (const event of compactedEvents) {
    connectionSubject.next(event);
  }
  
  // Bridge to active run if exists
  if (store.subject && store.isRunning) {
    store.subject.subscribe(connectionSubject);
  }
  
  return connectionSubject.asObservable();
}
Reference: packages/v2/runtime/src/runner/in-memory.ts:294

SQLiteAgentRunner

Persistent runner that stores thread state in SQLite. Use in production for conversation persistence:
import { SQLiteAgentRunner } from "@copilotkitnext/sqlite-runner";

const runner = new SQLiteAgentRunner({
  dbPath: "./copilot.db"
});

const runtime = new CopilotRuntime({
  agents: { /* ... */ },
  runner
});
Characteristics:
  • Persistent - Survives server restarts
  • Scalable - Can handle large conversation histories
  • Queryable - SQL access to conversation data
  • Transactional - ACID guarantees for state updates
Reference: packages/v2/sqlite-runner/src/sqlite-runner.ts

Schema

CREATE TABLE runs (
  id TEXT PRIMARY KEY,
  thread_id TEXT NOT NULL,
  parent_run_id TEXT,
  created_at INTEGER NOT NULL
);

CREATE TABLE events (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  run_id TEXT NOT NULL,
  event_type TEXT NOT NULL,
  event_data TEXT NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY (run_id) REFERENCES runs(id)
);

CREATE INDEX idx_events_run_id ON events(run_id);
CREATE INDEX idx_runs_thread_id ON runs(thread_id);

Event Persistence

// Store events as they arrive
for await (const event of agentStream) {
  db.insert("events", {
    run_id: runId,
    event_type: event.type,
    event_data: JSON.stringify(event),
    created_at: Date.now()
  });
  
  yield event; // Stream to frontend
}
Reference: packages/v2/sqlite-runner/src/sqlite-runner.ts:248

Custom AgentRunner

Build a custom runner for your storage backend:
import { AgentRunner } from "@copilotkitnext/runtime";
import { Observable } from "rxjs";

class RedisAgentRunner extends AgentRunner {
  constructor(private redis: RedisClient) {
    super();
  }

  run(request: AgentRunnerRunRequest): Observable<BaseEvent> {
    return new Observable((observer) => {
      const { threadId, agent, input } = request;
      
      // Load historic state from Redis
      const history = await this.redis.get(`thread:${threadId}`);
      
      // Execute agent
      agent.runAgent(input, {
        onEvent: ({ event }) => {
          // Persist event
          this.redis.rpush(`thread:${threadId}:events`, 
            JSON.stringify(event)
          );
          
          // Stream to frontend
          observer.next(event);
        }
      });
      
      return () => {
        // Cleanup
      };
    });
  }

  async connect(request: AgentRunnerConnectRequest): Observable<BaseEvent> {
    // Load and replay events from Redis
    const events = await this.redis.lrange(
      `thread:${request.threadId}:events`, 
      0, 
      -1
    );
    
    return new Observable((observer) => {
      for (const eventStr of events) {
        observer.next(JSON.parse(eventStr));
      }
      observer.complete();
    });
  }

  async isRunning(request: AgentRunnerIsRunningRequest): Promise<boolean> {
    return await this.redis.exists(`thread:${request.threadId}:running`);
  }

  async stop(request: AgentRunnerStopRequest): Promise<boolean> {
    const running = await this.isRunning(request);
    if (!running) return false;
    
    await this.redis.del(`thread:${request.threadId}:running`);
    return true;
  }
}
Custom runners must handle concurrent access safely. Use locks or transactions to prevent race conditions.

Middleware

Middleware provides hooks for cross-cutting concerns like authentication, logging, and request transformation.

Before Request Middleware

Runs before the request handler:
const runtime = new CopilotRuntime({
  agents: { /* ... */ },
  beforeRequestMiddleware: async ({ runtime, request, path }) => {
    // Authentication
    const token = request.headers.get("Authorization");
    if (!isValid(token)) {
      throw new Error("Unauthorized");
    }
    
    // Logging
    console.log(`Request to ${path} from ${token}`);
    
    // Transform request
    const newHeaders = new Headers(request.headers);
    newHeaders.set("X-User-ID", getUserId(token));
    
    return new Request(request.url, {
      ...request,
      headers: newHeaders
    });
  }
});
Reference: packages/v2/runtime/src/middleware.ts:72

After Request Middleware

Runs after the response is generated:
const runtime = new CopilotRuntime({
  agents: { /* ... */ },
  afterRequestMiddleware: async ({ 
    runtime, 
    response, 
    path, 
    messages,
    threadId,
    runId 
  }) => {
    // Log completion
    console.log(`Completed ${path} for thread ${threadId}`);
    
    // Analytics
    await analytics.track({
      event: "agent_run_completed",
      threadId,
      runId,
      messageCount: messages?.length
    });
    
    // Audit trail
    await audit.log({
      action: "agent_run",
      threadId,
      timestamp: Date.now()
    });
  }
});
Reference: packages/v2/runtime/src/middleware.ts:89

Middleware Use Cases

Common middleware patterns:
  1. Authentication - Verify JWT tokens, API keys, or session cookies
  2. Authorization - Check user permissions for specific agents
  3. Rate limiting - Throttle requests per user or IP
  4. Logging - Record all agent interactions
  5. Metrics - Track performance and usage
  6. Request transformation - Modify headers or payloads
  7. Response filtering - Remove sensitive data from responses

Thread Management

Thread IDs

Threads represent a conversation context. The frontend generates and manages thread IDs:
import { v4 as uuidv4 } from "uuid";

// Create new thread
const threadId = uuidv4();

// Run agent in this thread
await agent.runAgent({
  threadId,
  runId: uuidv4(),
  messages: [...]
});

Run IDs

Each agent execution within a thread gets a unique run ID:
// First run
await agent.runAgent({
  threadId: "thread_123",
  runId: "run_1",
  messages: [{ role: "user", content: "Hello" }]
});

// Follow-up run in same thread
await agent.runAgent({
  threadId: "thread_123",
  runId: "run_2",
  messages: [{ role: "user", content: "Tell me more" }]
});

Parent-Child Runs

Runners track parent-child relationships for run chains:
{
  threadId: "thread_123",
  runs: [
    {
      runId: "run_1",
      parentRunId: null,  // First run
      events: [...]
    },
    {
      runId: "run_2",
      parentRunId: "run_1",  // Child of run_1
      events: [...]
    }
  ]
}
Reference: packages/v2/runtime/src/runner/in-memory.ts:19

State Persistence

Event Compaction

To optimize storage, runners can compact event streams:
// Original events
[
  { type: "TEXT_MESSAGE_START", messageId: "msg_1" },
  { type: "TEXT_MESSAGE_CONTENT", messageId: "msg_1", content: "Hello" },
  { type: "TEXT_MESSAGE_CONTENT", messageId: "msg_1", content: " world" },
  { type: "TEXT_MESSAGE_END", messageId: "msg_1" }
]

// Compacted
[
  { 
    type: "TEXT_MESSAGE_START", 
    messageId: "msg_1",
    content: "Hello world"  // Merged content
  }
]
Reference: packages/v2/runtime/src/runner/in-memory.ts:213

Deduplication

Duplicate events are removed during compaction:
// Before
[
  { type: "RUN_STARTED", runId: "run_1" },
  { type: "RUN_STARTED", runId: "run_1" },  // Duplicate
  { type: "TEXT_MESSAGE_CONTENT", content: "Hello" }
]

// After
[
  { type: "RUN_STARTED", runId: "run_1" },
  { type: "TEXT_MESSAGE_CONTENT", content: "Hello" }
]

Concurrent Execution

Thread Isolation

Each thread runs independently:
// Thread 1
runner.run({
  threadId: "thread_1",
  agent: agent.clone(),
  input: { ... }
});

// Thread 2 (runs concurrently)
runner.run({
  threadId: "thread_2",
  agent: agent.clone(),
  input: { ... }
});

Agent Cloning

The runtime clones agents for each request to ensure isolation:
// Original agent
const agent = new BuiltInAgent({ ... });

// Cloned for request 1
const clone1 = agent.clone();

// Cloned for request 2
const clone2 = agent.clone();

// No shared state between clones
This prevents race conditions and state leakage between concurrent requests.

Preventing Concurrent Runs in Same Thread

run(request: AgentRunnerRunRequest): Observable<BaseEvent> {
  const store = GLOBAL_STORE.get(request.threadId);
  
  if (store.isRunning) {
    throw new Error("Thread already running");
  }
  
  store.isRunning = true;
  // ... execute agent
}
Reference: packages/v2/runtime/src/runner/in-memory.ts:109

Error Handling

Run Finalization

Runners ensure runs are properly finalized even on errors:
try {
  await agent.runAgent(input, { onEvent });
  
  // Success - emit RUN_FINISHED
  const appendedEvents = finalizeRunEvents(currentRunEvents, {
    stopRequested: false
  });
} catch (error) {
  // Error - emit RUN_ERROR
  const appendedEvents = finalizeRunEvents(currentRunEvents, {
    stopRequested: false,
    interruptionMessage: error.message
  });
} finally {
  // Always clean up
  store.isRunning = false;
  store.currentRunId = null;
}
Reference: packages/v2/runtime/src/runner/in-memory.ts:202

Stop Requests

Gracefully stop a running agent:
await runner.stop({
  threadId: "thread_123"
});

// Runner marks stop requested
// Agent receives abort signal
// RUN_FINISHED emitted with stopped flag
Reference: packages/v2/runtime/src/runner/in-memory.ts:352

Production Deployment

Scaling Considerations

Horizontal Scaling with InMemoryAgentRunner:
  • State is per-process
  • Use sticky sessions to route threads to same instance
  • Or use SQLiteAgentRunner for shared state
Horizontal Scaling with SQLiteAgentRunner:
  • SQLite doesn’t support concurrent writes from multiple processes
  • Use PostgreSQL/MySQL runner instead (custom implementation)
  • Or use Redis runner for distributed state

Health Checks

app.get("/health", async (req, res) => {
  try {
    // Check runner health
    const isHealthy = await runner.isRunning({ 
      threadId: "health_check" 
    });
    
    res.json({ status: "ok" });
  } catch (error) {
    res.status(500).json({ 
      status: "error", 
      message: error.message 
    });
  }
});

Monitoring

const runtime = new CopilotRuntime({
  agents: { /* ... */ },
  afterRequestMiddleware: async ({ threadId, runId, messages }) => {
    // Track metrics
    metrics.histogram("agent_run_duration", Date.now() - startTime);
    metrics.increment("agent_runs_total");
    metrics.gauge("active_threads", activeThreadCount);
    
    // Error tracking
    if (hasError) {
      errorTracker.captureException(error, {
        threadId,
        runId
      });
    }
  }
});

Best Practices

Runner Selection

  • Development - Use InMemoryAgentRunner for fast iteration
  • Production (stateless) - Use InMemoryAgentRunner with sticky sessions
  • Production (stateful) - Use SQLiteAgentRunner or custom persistent runner

State Management

  1. Keep state minimal - Only persist what’s necessary
  2. Compact regularly - Reduce storage overhead
  3. Archive old threads - Move inactive threads to cold storage
  4. Clean up on errors - Always finalize runs properly

Performance

  1. Clone agents efficiently - Avoid expensive operations in clone()
  2. Stream events promptly - Don’t buffer unnecessarily
  3. Use connection pooling - For database-backed runners
  4. Monitor memory - Track runner memory usage

Next Steps

Architecture

Understand where runtime fits in the architecture

Agents

Learn how agents are executed by the runtime

AG-UI Protocol

Understand the event streaming protocol

Frontend Integration

Connect your frontend to the runtime