Introducing "NAMO" Real-Time Speech AI Model: On-Device & Hybrid Cloud

At VideoSDK, Our mission is to build infrastructure of digital humans that runs on every device. We’re expanding into real-time speech-to-speech and vision AI agents that run privately on-device or in a hybrid cloud setting—delivering up to 98% of GPT-4’s accuracy at just 17.5% of its cost.

Unified SDK for On-Device SSLM + Cloud SSLM Collaboration

Today, we are launching NAMO-SSLM (Small Speech Language Model) hybrid model with state of the art hybrid inferencing engine.

Unified SDK for On-Device SSLM + Cloud SSLM

We’ve developed an inference engine that couples a streamlined, on-device with a more powerful, cloud-based SSLM, working in tandem for real-time speech use cases. Our engine ensures that most of the heavy lifting—long-context processing and advanced reasoning—occurs only when needed, thereby reducing cloud inference costs without sacrificing performance.

We’ve tackled three key challenges in end-to-end speech models: enabling multi-turn RAG for rich contextual responses, silent function calling for real-time tool use without delays, and client-side tool support like canvas and whiteboard via our Agent SDK.

Achieving 20× Cost Reduction with 98% Cloud Performance

Our engine orchestrates a local-remote workflow, delivering real-time speech capabilities on low-latency devices while leveraging cloud infrastructure for complex tasks. This results in a 20× cost reduction while maintaining 98% of cloud performance, ensuring a cost-effective, privacy-friendly, and high-performance solution.

Enterprise-Grade Privacy & Compliance

NAMO-SSLM keeps sensitive data on-device by default, ensuring documents and conversations never leave the device unless explicitly required. This hybrid design minimizes cloud dependency, reduces risk exposure, and delivers GDPR, HIPAA, and SOC2 compliance—making it more secure and privacy-first than traditional cloud-only models.

This approach not only meets the pressing need for robust compliance in highly regulated sectors like BFSI and Healthcare, but also benefits any organization demanding rapid, private, and cost-effective AI-driven customer experiences.

Open Sourcing NAMO-SSLM

We are excited to open-source NAMO-SSLM, small yet powerful real-time multi-modal. The AI landscape is shifting from massive, resource-intensive models to lightweight, optimized small models—and for good reason. Small models (like NAMO-SSLM) offer a compelling mix of efficiency, speed, and cost-effectiveness, making them the smarter choice for real-world applications.

Key research includes:

Run on CPU: Run model real-time on consumer CPU devices.
Multimodal (voice + vision): Native support for real-time speech and vision and OCR capabilites.
Low Latency, Real-Time Processing: Real-time streaming support with end to end latency as low as 80ms.
Multilingual Support: Enable multi-lang and hybrid language capabilities such as hing-lish.
Multi-turn RAG: Supports multi-turn RAG to retrive rich context from while keeping conversation real-time
Voiced + Silent Function / Tools Calling: Function calling support with silent voice with text as well as voice output.
Client Side Tool Support: Tool support for building client side UI/UX such as canvas, whiteboard etc.

NAMO-SSLM Github: https://github.com/videosdk-live/namo-sslm

The future of AI is efficient, real-time, and accessible on every device. With NAMO-SSLM, we're pioneering a hybrid on-device + cloud approach that delivers 5.7× cost efficiency while maintaining 98% of cloud-level performance.

Our hybrid architecture enables low-latency, private AI for a wide range of industries:

Healthcare – Real-time AI assistants for clinical support
Banking – Secure and seamless voice authentication
Insurance – AI-powered automated claims processing
Social – Instant multilingual voice translation
Education – Personalized AI tutors for interactive learning
Smart Glasses – Hands-free, AI-driven digital assistants
Robots – Conversational AI for intelligent automation
Gaming – Realistic, AI-powered NPC interactions

By open-sourcing NAMO-SSLM, we are empowering developers and businesses to build privacy-first, low-latency AI applications across industries—from healthcare to gaming, smart glasses to education.

The AI revolution is shifting from large, resource-heavy models to lightweight, real-time, multimodal intelligence. NAMO-SSLM is the future—blending speech and vision AI with real-time processing, multilingual capabilities, and CPU-optimized performance.

Join us in building the next generation of digital humans. 🚀

Apply now

Introducing "NAMO" Real-Time Speech AI Model: On-Device & Hybrid Cloud

Arjun Kava

Unified SDK for On-Device SSLM + Cloud SSLM Collaboration

Achieving 20× Cost Reduction with 98% Cloud Performance

Enterprise-Grade Privacy & Compliance

Open Sourcing NAMO-SSLM

Let’s build together

Free 10,000 minutes every month | No credit card required to start