A complete recap of everything we shipped in March 2026: Agents SDK v1.0.0, unified pipeline, and Agent Participant support across all RTC SDKs.

Welcome to the March edition of the VideoSDK Monthly Updates! This month marks one of the biggest milestones for our AI stack — the official launch of Agents v1.0.0.

We have reimagined how developers build AI agents with a unified pipeline architecture, introduced hooks for deep customization, expanded our plugin ecosystem, and rolled out agent support across every SDK. This is a huge one, let's get started!

Agents v1.0.0: A New Foundation for AI Voice Agents

This release is more than just an upgrade. It is a complete architectural shift. With Agents v1.0.0, we have unified multiple execution models into a single, flexible system that adapts to your use case automatically.

Video SDK Image

Read full release notes on GitHub

Unified Pipeline Architecture

CascadingPipeline and RealtimePipeline have been replaced with a single Pipeline class. Configure one pipeline and the SDK automatically determines the execution mode based on the components you provide.

Cascade Mode: STT to LLM to TTS

Compose any provider chain for complete control over each stage.

python
pipeline = Pipeline(
    stt=DeepgramSTT(),
    llm=GoogleLLM(),
    tts=CartesiaTTS(),
    vad=SileroVAD(),
    turn_detector=TurnDetector(),
)
session = AgentSession(agent=MyAgent(), pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)

Realtime Mode: Lowest Latency with Unified Models

Use a single realtime model for the full voice pipeline.

python
pipeline = Pipeline(
    llm=GeminiRealtime(
        model="gemini-3.1-flash-live-preview",
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"]),
    )
)
session = AgentSession(agent=MyAgent(), pipeline=pipeline)
await session.start(wait_for_participant=True, run_until_shutdown=True)

Other supported realtime models: OpenAIRealtime, AWSNovaSonic, AzureVoiceLive, xAI Grok, Ultravox.

Hybrid Mode: Mix Cascade and Realtime Components

Read the full Hybrid Mode docs.

python
# Custom STT + Realtime LLM (bring your own transcription)
pipeline = Pipeline(stt=DeepgramSTT(), llm=OpenAIRealtime(...))
# Realtime LLM + Custom TTS (bring your own voice)
pipeline = Pipeline(llm=OpenAIRealtime(...), tts=ElevenLabsTTS(...))

Flexible Agent Composition

Just pass the components you need. The pipeline handles the rest.

python
Pipeline(stt=...)                                         # Transcription only
Pipeline(llm=...)                                         # Text chatbot
Pipeline(stt=..., llm=..., tts=...)                       # Voice + Chat
Pipeline(stt=..., llm=..., tts=..., vad=..., turn_detector=...)  # Full voice agent
Pipeline(llm=OpenAIRealtime(...))                         # Realtime voice agent

Pipeline Hooks System

ConversationalFlow has been removed in favour of @pipeline.on(...) — a lightweight hooks engine that lets you intercept and transform data at any stage without subclassing. See the full Pipeline Hooks docs.

HOOK WHAT YOU CAN INTERCEPT / MODIFY
stt Incoming audio stream and transcript text. Clean, redact, or replace before LLM.
tts Outgoing text and synthesized audio stream. Adjust pronunciation, filter, or re-route.
llm Message list. Bypass the model entirely with a yield, or modify before inference.
vision_frame Raw video frames from participants
user_turn_start / user_turn_end User speaking lifecycle
agent_turn_start / agent_turn_end Agent response lifecycle
python
@pipeline.on("stt")
async def on_transcript(text: str) -> str:
    return text.strip()          # normalize before LLM
@pipeline.on("tts")
async def on_tts(text: str) -> str:
    return text.replace("SDK", "S D K")   # fix pronunciation
@pipeline.on("llm")
async def on_llm(messages):
    yield "Transferring you now."  # bypass LLM entirely

Observability

Per-component metrics, structured logging, and OpenTelemetry tracing are built in across all pipeline modes. Custom endpoints configurable via RoomOptions.

Anam AI Avatar Plugin

Bring your agents to life with the Anam AI avatar plugin. Attach it to your pipeline with avatar=AnamAI(...) and the framework handles WebRTC data channel audio routing, interrupts, and teardown automatically. Read the Anam AI plugin docs or read this blog for full setup.

Video SDK Image

LangChain and LangGraph Support

Drop in any LangChain BaseChatModel or LangGraph StateGraph as your pipeline LLM. See the full LangChain plugin docs.

python
from videosdk.plugins.langchain import LangChainLLM, LangGraphLLM
# Use any LangChain BaseChatModel
pipeline = Pipeline(llm=LangChainLLM(model=ChatOpenAI(...)))
# Use a LangGraph StateGraph
pipeline = Pipeline(llm=LangGraphLLM(graph=my_graph))

Structured Recording

python
from videosdk.agents import RoomOptions, RecordingOptions
room_options = RoomOptions(
    recording=True,  # audio-only by default
    recording_options=RecordingOptions(
        video=True,         # opt-in to camera recording
        screen_share=True,  # opt-in to screen share recording
    )
)

Migrating from v0.x

Replace CascadingPipeline and RealtimePipeline with Pipeline. Replace any ConversationalFlow subclass with @pipeline.on(...) hooks. Constructor arguments stay the same. Everything else (AgentSession, WorkerJob, VAD, fallback providers, MCP tools) works as before.

Full Release Notes on GitHub

Agent Participant Support Across All RTC SDKs

AI agents are now first-class citizens across every VideoSDK platform. All five RTC SDKs shipped native Agent Participant support this month. Your app can now detect, track, and react to AI agents in a room including live state changes and real-time transcription.

What Every SDK Now Supports

  • AgentParticipant / isAgent: When an agent joins a room, it is automatically identified as a distinct participant type. No manual detection needed.
  • AgentState enum: Track lifecycle with IDLE, LISTENING, THINKING, and SPEAKING. Know exactly what your agent is doing at any moment.
  • State change events: React and React Native expose onAgentStateChange; JS fires agent-state-change; iOS uses onAgentStateChanged; Flutter uses Events.agentStateChanged.
  • Live transcription: Receive real-time agent transcription with the speaking participant and a timestamped text segment, available across all five SDKs.

Google Gemini 3.1 Support

VideoSDK now supports Google Gemini 3.1 in the realtime pipeline. Drop in GeminiRealtime with the latest model and get ultra-low-latency voice responses powered by Gemini's newest architecture.

python
pipeline = Pipeline(
    llm=GeminiRealtime(
        model="gemini-3.1-flash-live-preview",
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"]),
    )
)

AI Voice Agent Starter Apps

Get up and running in minutes with our ready-to-use starter apps. Each one is pre-wired with Agent Participant support, live transcription, and AgentState UI out of the box.

New Content and Resources

New guides cover the unified Pipeline, hooks deep dive, agent lifecycle and events, and SDK-specific agent integration.

New Guides and Tutorials

Featured Videos

SDK Sketches

Video SDK Image
This month's sketch: CascadingPipeline and RealtimePipeline walk into v1.0.0... and they're the same picture.

What's Next?

As we move through 2026, our focus remains on pushing the boundaries of what's possible with real-time communication and AI. Expect even more powerful tools, deeper platform integrations, and a relentless focus on the developer experience.

Ready to Build?

Upgrade to Agents SDK v1.0.0 and the latest RTC SDKs to get everything in this release.

GitHub Join our Discord