Product Updates - March 2026 : Agents SDK v1.0.0, Unified Pipeline & Agent Participants Across All SDKs

A complete recap of everything we shipped in March 2026: Agents SDK v1.0.0, unified pipeline, and Agent Participant support across all RTC SDKs.

Welcome to the March edition of the VideoSDK Monthly Updates! This month marks one of the biggest milestones for our AI stack — the official launch of Agents v1.0.0.

We have reimagined how developers build AI agents with a unified pipeline architecture, introduced hooks for deep customization, expanded our plugin ecosystem, and rolled out agent support across every SDK. This is a huge one, let's get started!

Agents v1.0.0: A New Foundation for AI Voice Agents

This release is more than just an upgrade. It is a complete architectural shift. With Agents v1.0.0, we have unified multiple execution models into a single, flexible system that adapts to your use case automatically.

Read full release notes on GitHub

Unified Pipeline Architecture

CascadingPipeline and RealtimePipeline have been replaced with a single Pipeline class. Configure one pipeline and the SDK automatically determines the execution mode based on the components you provide.

Cascade Mode: STT to LLM to TTS

Compose any provider chain for complete control over each stage.

python

pipeline = Pipeline(
    stt=DeepgramSTT(),
    llm=GoogleLLM(),
    tts=CartesiaTTS(),
    vad=SileroVAD(),
    turn_detector=TurnDetector(),
)
session = AgentSession(agent=MyAgent(), pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)

Realtime Mode: Lowest Latency with Unified Models

Use a single realtime model for the full voice pipeline.

python

pipeline = Pipeline(
    llm=GeminiRealtime(
        model="gemini-3.1-flash-live-preview",
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"]),
    )
)
session = AgentSession(agent=MyAgent(), pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)

Other supported realtime models: OpenAIRealtime, AWSNovaSonic, AzureVoiceLive, xAI Grok, Ultravox.

Hybrid Mode: Mix Cascade and Realtime Components

Read the full Hybrid Mode docs.

python

# Custom STT + Realtime LLM (bring your own transcription)
pipeline = Pipeline(stt=DeepgramSTT(), llm=OpenAIRealtime(...))
# Realtime LLM + Custom TTS (bring your own voice)
pipeline = Pipeline(llm=OpenAIRealtime(...), tts=ElevenLabsTTS(...))

Flexible Agent Composition

Just pass the components you need. The pipeline handles the rest.

python

Pipeline(stt=...)                                         # Transcription only
Pipeline(llm=...)                                         # Text chatbot
Pipeline(stt=..., llm=..., tts=...)                       # Voice + Chat
Pipeline(stt=..., llm=..., tts=..., vad=..., turn_detector=...)  # Full voice agent
Pipeline(llm=OpenAIRealtime(...))                         # Realtime voice agent

Pipeline Hooks System

ConversationalFlow has been removed in favour of @pipeline.on(...) — a lightweight hooks engine that lets you intercept and transform data at any stage without subclassing. See the full Pipeline Hooks docs.

HOOK	WHAT YOU CAN INTERCEPT / MODIFY
stt	Incoming audio stream and transcript text. Clean, redact, or replace before LLM.
tts	Outgoing text and synthesized audio stream. Adjust pronunciation, filter, or re-route.
llm	Message list. Bypass the model entirely with a yield, or modify before inference.
vision_frame	Raw video frames from participants
user_turn_start / user_turn_end	User speaking lifecycle
agent_turn_start / agent_turn_end	Agent response lifecycle

python

@pipeline.on("stt")
async def on_transcript(text: str) -> str:
    return text.strip()          # normalize before LLM
@pipeline.on("tts")
async def on_tts(text: str) -> str:
    return text.replace("SDK", "S D K")   # fix pronunciation
@pipeline.on("llm")
async def on_llm(messages):
    yield "Transferring you now."  # bypass LLM entirely

Observability

Per-component metrics, structured logging, and OpenTelemetry tracing are built in across all pipeline modes. Custom endpoints configurable via RoomOptions.

Anam AI Avatar Plugin

Bring your agents to life with the Anam AI avatar plugin. Attach it to your pipeline with avatar=AnamAI(...) and the framework handles WebRTC data channel audio routing, interrupts, and teardown automatically. Read the Anam AI plugin docs or read this blog for full setup.

LangChain and LangGraph Support

Drop in any LangChain BaseChatModel or LangGraph StateGraph as your pipeline LLM. See the full LangChain plugin docs.

python

from videosdk.plugins.langchain import LangChainLLM, LangGraphLLM
# Use any LangChain BaseChatModel
pipeline = Pipeline(llm=LangChainLLM(model=ChatOpenAI(...)))
# Use a LangGraph StateGraph
pipeline = Pipeline(llm=LangGraphLLM(graph=my_graph))

Structured Recording

python

from videosdk.agents import RoomOptions, RecordingOptions
room_options = RoomOptions(
    recording=True,  # audio-only by default
    recording_options=RecordingOptions(
        video=True,         # opt-in to camera recording
        screen_share=True,  # opt-in to screen share recording
    )
)

Migrating from v0.x

Replace CascadingPipeline and RealtimePipeline with Pipeline. Replace any ConversationalFlow subclass with @pipeline.on(...) hooks. Constructor arguments stay the same. Everything else (AgentSession, WorkerJob, VAD, fallback providers, MCP tools) works as before.

Full Release Notes on GitHub

Agent Participant Support Across All RTC SDKs

AI agents are now first-class citizens across every VideoSDK platform. All five RTC SDKs shipped native Agent Participant support this month. Your app can now detect, track, and react to AI agents in a room including live state changes and real-time transcription.

What Every SDK Now Supports

AgentParticipant / isAgent: When an agent joins a room, it is automatically identified as a distinct participant type. No manual detection needed.
AgentState enum: Track lifecycle with IDLE, LISTENING, THINKING, and SPEAKING. Know exactly what your agent is doing at any moment.
State change events: React and React Native expose onAgentStateChange; JS fires agent-state-change; iOS uses onAgentStateChanged; Flutter uses Events.agentStateChanged.
Live transcription: Receive real-time agent transcription with the speaking participant and a timestamped text segment, available across all five SDKs.

Google Gemini 3.1 Support

VideoSDK now supports Google Gemini 3.1 in the realtime pipeline. Drop in GeminiRealtime with the latest model and get ultra-low-latency voice responses powered by Gemini's newest architecture.

python

pipeline = Pipeline(
    llm=GeminiRealtime(
        model="gemini-3.1-flash-live-preview",
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"]),
    )
)

AI Voice Agent Starter Apps

Get up and running in minutes with our ready-to-use starter apps. Each one is pre-wired with Agent Participant support, live transcription, and AgentState UI out of the box.

React Starter App: Web integration with full agent UI
Flutter Starter App: Cross-platform mobile agent integration
iOS Starter App: Native iOS agent integration

New Content and Resources

New guides cover the unified Pipeline, hooks deep dive, agent lifecycle and events, and SDK-specific agent integration.

New Guides and Tutorials

Deploy your AI Voice Agent: A step-by-step guide to deploying production-ready voice agents on VideoSDK.
Build AI Virtual Avatars with Anam AI: How to build interactive AI avatars using the Anam AI plugin.
Pipeline hooks deep dive: How to intercept and transform data at every pipeline stage.
LangChain and LangGraph integration: Connect agents to external tools and multi-step reasoning workflows.