A complete recap of everything we shipped in March 2026: Agents SDK v1.0.0, unified pipeline, and Agent Participant support across all RTC SDKs.
Welcome to the March edition of the VideoSDK Monthly Updates! This month marks one of the biggest milestones for our AI stack — the official launch of Agents v1.0.0.
We have reimagined how developers build AI agents with a unified pipeline architecture, introduced hooks for deep customization, expanded our plugin ecosystem, and rolled out agent support across every SDK. This is a huge one, let's get started!
Agents v1.0.0: A New Foundation for AI Voice Agents
This release is more than just an upgrade. It is a complete architectural shift. With Agents v1.0.0, we have unified multiple execution models into a single, flexible system that adapts to your use case automatically.
Read full release notes on GitHub
Unified Pipeline Architecture
CascadingPipeline and RealtimePipeline have been replaced with a single Pipeline class. Configure one pipeline and the SDK automatically determines the execution mode based on the components you provide.
Cascade Mode: STT to LLM to TTS
Compose any provider chain for complete control over each stage.
Realtime Mode: Lowest Latency with Unified Models
Use a single realtime model for the full voice pipeline.
Other supported realtime models: OpenAIRealtime, AWSNovaSonic, AzureVoiceLive, xAI Grok, Ultravox.
Hybrid Mode: Mix Cascade and Realtime Components
Read the full Hybrid Mode docs.
Flexible Agent Composition
Just pass the components you need. The pipeline handles the rest.
Pipeline Hooks System
ConversationalFlow has been removed in favour of @pipeline.on(...) — a lightweight hooks engine that lets you intercept and transform data at any stage without subclassing. See the full Pipeline Hooks docs.
| HOOK | WHAT YOU CAN INTERCEPT / MODIFY |
|---|---|
| stt | Incoming audio stream and transcript text. Clean, redact, or replace before LLM. |
| tts | Outgoing text and synthesized audio stream. Adjust pronunciation, filter, or re-route. |
| llm | Message list. Bypass the model entirely with a yield, or modify before inference. |
| vision_frame | Raw video frames from participants |
| user_turn_start / user_turn_end | User speaking lifecycle |
| agent_turn_start / agent_turn_end | Agent response lifecycle |
Observability
Per-component metrics, structured logging, and OpenTelemetry tracing are built in across all pipeline modes. Custom endpoints configurable via RoomOptions.
Anam AI Avatar Plugin
Bring your agents to life with the Anam AI avatar plugin. Attach it to your pipeline with avatar=AnamAI(...) and the framework handles WebRTC data channel audio routing, interrupts, and teardown automatically. Read the Anam AI plugin docs or read this blog for full setup.
LangChain and LangGraph Support
Drop in any LangChain BaseChatModel or LangGraph StateGraph as your pipeline LLM. See the full LangChain plugin docs.
Structured Recording
Migrating from v0.x
Replace CascadingPipeline and RealtimePipeline with Pipeline. Replace any ConversationalFlow subclass with @pipeline.on(...) hooks. Constructor arguments stay the same. Everything else (AgentSession, WorkerJob, VAD, fallback providers, MCP tools) works as before.
Full Release Notes on GitHubAgent Participant Support Across All RTC SDKs
AI agents are now first-class citizens across every VideoSDK platform. All five RTC SDKs shipped native Agent Participant support this month. Your app can now detect, track, and react to AI agents in a room including live state changes and real-time transcription.
What Every SDK Now Supports
- AgentParticipant / isAgent: When an agent joins a room, it is automatically identified as a distinct participant type. No manual detection needed.
- AgentState enum: Track lifecycle with IDLE, LISTENING, THINKING, and SPEAKING. Know exactly what your agent is doing at any moment.
- State change events: React and React Native expose onAgentStateChange; JS fires agent-state-change; iOS uses onAgentStateChanged; Flutter uses Events.agentStateChanged.
- Live transcription: Receive real-time agent transcription with the speaking participant and a timestamped text segment, available across all five SDKs.
Google Gemini 3.1 Support
VideoSDK now supports Google Gemini 3.1 in the realtime pipeline. Drop in GeminiRealtime with the latest model and get ultra-low-latency voice responses powered by Gemini's newest architecture.
AI Voice Agent Starter Apps
Get up and running in minutes with our ready-to-use starter apps. Each one is pre-wired with Agent Participant support, live transcription, and AgentState UI out of the box.
- React Starter App: Web integration with full agent UI
- Flutter Starter App: Cross-platform mobile agent integration
- iOS Starter App: Native iOS agent integration
New Content and Resources
New guides cover the unified Pipeline, hooks deep dive, agent lifecycle and events, and SDK-specific agent integration.
New Guides and Tutorials
- Deploy your AI Voice Agent: A step-by-step guide to deploying production-ready voice agents on VideoSDK.
- Build AI Virtual Avatars with Anam AI: How to build interactive AI avatars using the Anam AI plugin.
- Pipeline hooks deep dive: How to intercept and transform data at every pipeline stage.
- LangChain and LangGraph integration: Connect agents to external tools and multi-step reasoning workflows.
Featured Videos
SDK Sketches
What's Next?
As we move through 2026, our focus remains on pushing the boundaries of what's possible with real-time communication and AI. Expect even more powerful tools, deeper platform integrations, and a relentless focus on the developer experience.
Ready to Build?
Upgrade to Agents SDK v1.0.0 and the latest RTC SDKs to get everything in this release.
GitHub Join our Discord