Voice interfaces have become an essential part of modern mobile applications, offering intuitive ways for users to interact with your app while enhancing accessibility. Implementing realtime speech to text capabilities in your React Native application can transform user experience by enabling seamless voice-to-text conversion during live conversations.
In this comprehensive guide, we'll build a complete React Native speech to text example using VideoSDK's powerful transcription features. Whether you're developing a video conferencing app, a voice assistant, or adding accessibility features to your existing application, this tutorial will show you how to implement real-time transcription in just a few steps.
Understanding VideoSDK's Realtime Transcription Flow
Before diving into the code, let's understand how VideoSDK handles real-time transcription:
- Initiation: Your React Native app initiates transcription using the
startTranscription
method - Resource Acquisition: VideoSDK's server requests necessary resources from the transcription service
- Status Updates: The server sends event updates (starting, started, stopping, stopped)
- Transcription Data: As speech is detected, your app receives transcription text events with the converted text
- Termination: When you call the
stopTranscription
method, resources are released
This architecture allows for efficient, low-latency transcription directly integrated with your video calls.
Setting Up Your React Native Project
Let's start by setting up a React Native project with VideoSDK.
Creating a New Project
1# Create a new React Native project
2npx react-native init VideoSDKTranscriptionDemo
3
4# Navigate to the project directory
5cd VideoSDKTranscriptionDemo
6
Installing VideoSDK
Add the VideoSDK React Native package to your project:
1# Install VideoSDK React Native
2npm install @videosdk.live/react-native-sdk
3
4# Install additional required dependencies
5npm install @videosdk.live/react-native-incallmanager react-native-permissions
6
Setting Up Permissions
For speech recognition to work, we need microphone permissions:
For iOS (Info.plist)
Add the following to your
ios/YourApp/Info.plist
file:1<key>NSMicrophoneUsageDescription</key>
2<string>This app needs access to your microphone for voice transcription</string>
3<key>NSCameraUsageDescription</key>
4<string>This app needs access to your camera for video calls</string>
5
For Android (AndroidManifest.xml)
Add these permissions to your
android/app/src/main/AndroidManifest.xml
file:1<uses-permission android:name="android.permission.RECORD_AUDIO" />
2<uses-permission android:name="android.permission.CAMERA" />
3<uses-permission android:name="android.permission.INTERNET" />
4
Building the Transcription Component
Now, let's create a component that utilizes VideoSDK's real-time transcription capabilities. We'll use the
useTranscription
hook to access the transcription methods and event handlers.Create a new file called
TranscriptionComponent.js
in your project:1// TranscriptionComponent.js
2import React, { useState, useEffect } from 'react';
3import {
4 View,
5 Text,
6 TouchableOpacity,
7 StyleSheet,
8 FlatList,
9 ActivityIndicator,
10} from 'react-native';
11import { Constants, useTranscription } from "@videosdk.live/react-native-sdk";
12
13const TranscriptionComponent = () => {
14 // State variables to manage transcription
15 const [isTranscribing, setIsTranscribing] = useState(false);
16 const [isStarting, setIsStarting] = useState(false);
17 const [transcriptionText, setTranscriptionText] = useState([]);
18 const [error, setError] = useState(null);
19
20 // Configure event handlers for transcription events
21 const onTranscriptionStateChanged = (data) => {
22 const { status, id } = data;
23
24 if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTING) {
25 console.log("Realtime Transcription is starting", id);
26 setIsStarting(true);
27 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTED) {
28 console.log("Realtime Transcription is started", id);
29 setIsTranscribing(true);
30 setIsStarting(false);
31 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPING) {
32 console.log("Realtime Transcription is stopping", id);
33 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPED) {
34 console.log("Realtime Transcription is stopped", id);
35 setIsTranscribing(false);
36 setIsStarting(false);
37 }
38 };
39
40 // Handle incoming transcription text
41 const onTranscriptionText = (data) => {
42 const { participantId, participantName, text, timestamp, type } = data;
43 console.log(`${participantName}: ${text} ${timestamp}`);
44
45 // Add new transcription to the list
46 setTranscriptionText(prevTexts => [
47 {
48 id: timestamp,
49 name: participantName,
50 text: text,
51 timestamp: timestamp
52 },
53 ...prevTexts
54 ]);
55 };
56
57 // Get transcription methods from the hook
58 const { startTranscription, stopTranscription } = useTranscription({
59 onTranscriptionStateChanged,
60 onTranscriptionText,
61 });
62
63 // Function to handle starting transcription
64 const handleStartTranscription = () => {
65 try {
66 // Configuration for realtime transcription
67 const config = {
68 // Optional webhook URL for receiving transcription data externally
69 webhookUrl: null,
70
71 // Configure summary generation
72 summary: {
73 enabled: true,
74 prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
75 }
76 };
77
78 // Start transcription with the configuration
79 startTranscription(config);
80 } catch (err) {
81 setError("Failed to start transcription: " + err.message);
82 }
83 };
84
85 // Function to handle stopping transcription
86 const handleStopTranscription = () => {
87 try {
88 stopTranscription();
89 } catch (err) {
90 setError("Failed to stop transcription: " + err.message);
91 }
92 };
93
94 // Format timestamp for display
95 const formatTimestamp = (timestamp) => {
96 const date = new Date(timestamp);
97 return `${date.getHours()}:${date.getMinutes()}:${date.getSeconds()}`;
98 };
99
100 // Render the component
101 return (
102 <View style={styles.container}>
103 <Text style={styles.title}>VideoSDK Realtime Transcription</Text>
104
105 {/* Transcription control button */}
106 <TouchableOpacity
107 style={[
108 styles.button,
109 isTranscribing ? styles.stopButton : styles.startButton,
110 isStarting && styles.disabledButton
111 ]}
112 onPress={isTranscribing ? handleStopTranscription : handleStartTranscription}
113 disabled={isStarting}
114 >
115 {isStarting ? (
116 <View style={styles.buttonContent}>
117 <ActivityIndicator color="#fff" size="small" />
118 <Text style={styles.buttonText}>Starting Transcription...</Text>
119 </View>
120 ) : (
121 <Text style={styles.buttonText}>
122 {isTranscribing ? "Stop Transcription" : "Start Transcription"}
123 </Text>
124 )}
125 </TouchableOpacity>
126
127 {/* Error message display */}
128 {error && (
129 <Text style={styles.errorText}>{error}</Text>
130 )}
131
132 {/* Transcription display */}
133 <View style={styles.transcriptionContainer}>
134 <Text style={styles.sectionTitle}>
135 {isTranscribing ? "Live Transcription" : "Transcription History"}
136 </Text>
137
138 {transcriptionText.length === 0 ? (
139 <Text style={styles.emptyText}>
140 {isTranscribing
141 ? "Waiting for speech..."
142 : "No transcription history yet. Start transcription to begin."}
143 </Text>
144 ) : (
145 <FlatList
146 data={transcriptionText}
147 keyExtractor={(item) => item.id.toString()}
148 renderItem={({ item }) => (
149 <View style={styles.transcriptionItem}>
150 <View style={styles.transcriptionHeader}>
151 <Text style={styles.speakerName}>{item.name}</Text>
152 <Text style={styles.timestamp}>
153 {formatTimestamp(item.timestamp)}
154 </Text>
155 </View>
156 <Text style={styles.transcriptionText}>{item.text}</Text>
157 </View>
158 )}
159 style={styles.transcriptionList}
160 />
161 )}
162 </View>
163 </View>
164 );
165};
166
167// Component styles
168const styles = StyleSheet.create({
169 container: {
170 flex: 1,
171 padding: 20,
172 backgroundColor: '#f5f7fa',
173 },
174 title: {
175 fontSize: 24,
176 fontWeight: 'bold',
177 marginBottom: 20,
178 color: '#333',
179 textAlign: 'center',
180 },
181 button: {
182 padding: 15,
183 borderRadius: 8,
184 alignItems: 'center',
185 marginBottom: 20,
186 },
187 buttonContent: {
188 flexDirection: 'row',
189 alignItems: 'center',
190 justifyContent: 'center',
191 },
192 startButton: {
193 backgroundColor: '#4CAF50',
194 },
195 stopButton: {
196 backgroundColor: '#F44336',
197 },
198 disabledButton: {
199 backgroundColor: '#9E9E9E',
200 },
201 buttonText: {
202 color: 'white',
203 fontWeight: 'bold',
204 fontSize: 16,
205 marginLeft: 8,
206 },
207 errorText: {
208 color: '#F44336',
209 marginBottom: 10,
210 },
211 transcriptionContainer: {
212 flex: 1,
213 backgroundColor: 'white',
214 borderRadius: 8,
215 padding: 15,
216 shadowColor: '#000',
217 shadowOffset: { width: 0, height: 2 },
218 shadowOpacity: 0.1,
219 shadowRadius: 4,
220 elevation: 2,
221 },
222 sectionTitle: {
223 fontSize: 18,
224 fontWeight: 'bold',
225 marginBottom: 10,
226 color: '#333',
227 },
228 emptyText: {
229 textAlign: 'center',
230 color: '#666',
231 marginTop: 20,
232 },
233 transcriptionList: {
234 flex: 1,
235 },
236 transcriptionItem: {
237 borderBottomWidth: 1,
238 borderBottomColor: '#f0f0f0',
239 paddingVertical: 10,
240 },
241 transcriptionHeader: {
242 flexDirection: 'row',
243 justifyContent: 'space-between',
244 marginBottom: 5,
245 },
246 speakerName: {
247 fontWeight: 'bold',
248 color: '#2196F3',
249 },
250 timestamp: {
251 color: '#9E9E9E',
252 fontSize: 12,
253 },
254 transcriptionText: {
255 fontSize: 16,
256 color: '#333',
257 },
258});
259
260export default TranscriptionComponent;
261
Implementing Real-Time Transcription Using VideoSDK's Hooks
Let's create a simpler example that focuses specifically on VideoSDK's transcription capabilities, following the documentation you provided:
1import React, { useState } from 'react';
2import {
3 View,
4 Text,
5 TouchableOpacity,
6 StyleSheet,
7 ScrollView
8} from 'react-native';
9import { Constants, useTranscription } from "@videosdk.live/react-native-sdk";
10
11const TranscriptionDemo = () => {
12 const [transcriptions, setTranscriptions] = useState([]);
13 const [isTranscribing, setIsTranscribing] = useState(false);
14
15 // Configure transcription event handlers
16 function onTranscriptionStateChanged(data) {
17 const { status, id } = data;
18
19 if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTING) {
20 console.log("Realtime Transcription is starting", id);
21 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTED) {
22 console.log("Realtime Transcription is started", id);
23 setIsTranscribing(true);
24 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPING) {
25 console.log("Realtime Transcription is stopping", id);
26 } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPED) {
27 console.log("Realtime Transcription is stopped", id);
28 setIsTranscribing(false);
29 }
30 }
31
32 // Handle incoming transcription text
33 function onTranscriptionText(data) {
34 let { participantId, participantName, text, timestamp, type } = data;
35 console.log(`${participantName}: ${text} ${timestamp}`);
36
37 // Add to transcriptions list
38 setTranscriptions(prev => [...prev, {
39 id: Date.now().toString(),
40 name: participantName,
41 text,
42 timestamp
43 }]);
44 }
45
46 // Get transcription methods from the hook
47 const { startTranscription, stopTranscription } = useTranscription({
48 onTranscriptionStateChanged,
49 onTranscriptionText,
50 });
51
52 // Start transcription with configuration
53 const handleStartTranscription = () => {
54 // Configuration for realtime transcription
55 const config = {
56 webhookUrl: null, // Optional webhook URL
57 summary: {
58 enabled: true,
59 prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
60 }
61 };
62
63 startTranscription(config);
64 };
65
66 // Stop ongoing transcription
67 const handleStopTranscription = () => {
68 stopTranscription();
69 };
70
71 return (
72 <View style={styles.container}>
73 <Text style={styles.title}>VideoSDK Realtime Transcription</Text>
74
75 {/* Control buttons */}
76 <View style={styles.controls}>
77 <TouchableOpacity
78 style={[styles.button, !isTranscribing ? styles.active : styles.inactive]}
79 onPress={handleStartTranscription}
80 disabled={isTranscribing}
81 >
82 <Text style={styles.buttonText}>Start Transcription</Text>
83 </TouchableOpacity>
84
85 <TouchableOpacity
86 style={[styles.button, isTranscribing ? styles.active : styles.inactive]}
87 onPress={handleStopTranscription}
88 disabled={!isTranscribing}
89 >
90 <Text style={styles.buttonText}>Stop Transcription</Text>
91 </TouchableOpacity>
92 </View>
93
94 {/* Transcription display */}
95 <View style={styles.transcriptionContainer}>
96 <Text style={styles.sectionHeader}>Transcription:</Text>
97
98 {transcriptions.length === 0 ? (
99 <Text style={styles.emptyMessage}>No transcription data yet. Start transcription and speak to see results.</Text>
100 ) : (
101 <ScrollView style={styles.transcriptionScroll}>
102 {transcriptions.map(item => (
103 <View key={item.id} style={styles.transcriptionItem}>
104 <Text style={styles.speakerName}>{item.name}:</Text>
105 <Text style={styles.transcriptionText}>{item.text}</Text>
106 </View>
107 ))}
108 </ScrollView>
109 )}
110 </View>
111 </View>
112 );
113};
114
115const styles = StyleSheet.create({
116 container: {
117 flex: 1,
118 padding: 20,
119 backgroundColor: '#f5f5f5',
120 },
121 title: {
122 fontSize: 22,
123 fontWeight: 'bold',
124 textAlign: 'center',
125 marginBottom: 20,
126 },
127 controls: {
128 flexDirection: 'row',
129 justifyContent: 'space-around',
130 marginBottom: 20,
131 },
132 button: {
133 paddingVertical: 12,
134 paddingHorizontal: 20,
135 borderRadius: 8,
136 width: '45%',
137 alignItems: 'center',
138 },
139 active: {
140 backgroundColor: '#2196F3',
141 },
142 inactive: {
143 backgroundColor: '#B0BEC5',
144 },
145 buttonText: {
146 color: 'white',
147 fontWeight: 'bold',
148 },
149 transcriptionContainer: {
150 flex: 1,
151 backgroundColor: 'white',
152 borderRadius: 8,
153 padding: 16,
154 shadowColor: '#000',
155 shadowOffset: { width: 0, height: 2 },
156 shadowOpacity: 0.1,
157 shadowRadius: 4,
158 elevation: 3,
159 },
160 sectionHeader: {
161 fontSize: 18,
162 fontWeight: 'bold',
163 marginBottom: 12,
164 },
165 emptyMessage: {
166 textAlign: 'center',
167 color: '#757575',
168 fontStyle: 'italic',
169 marginTop: 30,
170 },
171 transcriptionScroll: {
172 flex: 1,
173 },
174 transcriptionItem: {
175 marginBottom: 10,
176 borderBottomWidth: 1,
177 borderBottomColor: '#f0f0f0',
178 paddingBottom: 10,
179 },
180 speakerName: {
181 fontWeight: 'bold',
182 color: '#2196F3',
183 },
184 transcriptionText: {
185 fontSize: 16,
186 color: '#333',
187 marginTop: 4,
188 },
189});
190
191export default TranscriptionDemo;
192
Using the Transcription Component in a Meeting Context
To use the transcription component within a meeting context, you need to wrap it with the
MeetingProvider
component from VideoSDK:1import React from 'react';
2import { View, StyleSheet, SafeAreaView } from 'react-native';
3import { MeetingProvider } from '@videosdk.live/react-native-sdk';
4import TranscriptionDemo from './TranscriptionDemo';
5
6const App = () => {
7 // Replace with your VideoSDK token and meeting ID
8 const token = "YOUR_VIDEOSDK_TOKEN";
9 const meetingId = "YOUR_MEETING_ID";
10
11 return (
12 <SafeAreaView style={styles.container}>
13 <MeetingProvider
14 config={{
15 meetingId,
16 micEnabled: true,
17 webcamEnabled: true,
18 name: "Test User",
19 participantId: "participant-id",
20 token
21 }}
22 >
23 <TranscriptionDemo />
24 </MeetingProvider>
25 </SafeAreaView>
26 );
27};
28
29const styles = StyleSheet.create({
30 container: {
31 flex: 1,
32 backgroundColor: '#f5f5f5',
33 }
34});
35
36export default App;
37
Advanced Transcription Features
VideoSDK offers several advanced features to enhance your realtime speech to text implementation.
1. Configuring Summary Generation
You can configure the transcription to generate a summary after the meeting:
1const config = {
2 webhookUrl: null,
3 summary: {
4 enabled: true,
5 prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
6 }
7};
8
9startTranscription(config);
10
This will generate a structured summary of the transcribed content, which can be especially useful for meeting documentation.
2. Handling Transcription Events
VideoSDK provides a comprehensive set of events to track the transcription process:
1// Constants for transcription events
2const {
3 TRANSCRIPTION_STARTING,
4 TRANSCRIPTION_STARTED,
5 TRANSCRIPTION_STOPPING,
6 TRANSCRIPTION_STOPPED
7} = Constants.transcriptionEvents;
8
9function onTranscriptionStateChanged(data) {
10 const { status, id } = data;
11
12 switch(status) {
13 case TRANSCRIPTION_STARTING:
14 console.log("Transcription is starting");
15 // Show loading indicator
16 break;
17 case TRANSCRIPTION_STARTED:
18 console.log("Transcription has started");
19 // Show active transcription UI
20 break;
21 case TRANSCRIPTION_STOPPING:
22 console.log("Transcription is stopping");
23 // Show stopping indicator
24 break;
25 case TRANSCRIPTION_STOPPED:
26 console.log("Transcription has stopped");
27 // Update UI to show transcription is inactive
28 break;
29 default:
30 console.log("Unknown transcription status");
31 }
32}
33
3. Sending Transcription to External Services
You can configure a webhook URL to receive the transcription data in real-time:
1const config = {
2 webhookUrl: "https://your-webhook-url.com/transcription",
3 summary: { enabled: true }
4};
5
6startTranscription(config);
7
This allows you to process the transcription data with external services or store it in your database.
Performance Considerations
When implementing real-time speech to text in React Native, consider these performance optimizations:
1. Memory Management
For long meetings, implement a limit on the transcription history to prevent memory issues:
1// Limit transcription history to last 100 items
2useEffect(() => {
3 if (transcriptions.length > 100) {
4 setTranscriptions(transcriptions.slice(-100));
5 }
6}, [transcriptions]);
7
2. Efficient Rendering
Use virtualized lists for rendering long transcriptions:
1import { FlatList } from 'react-native';
2
3<FlatList
4 data={transcriptions}
5 keyExtractor={item => item.id}
6 renderItem={({ item }) => (
7 <View style={styles.item}>
8 <Text>{item.text}</Text>
9 </View>
10 )}
11 initialNumToRender={10}
12 maxToRenderPerBatch={10}
13 windowSize={5}
14/>
15
3. Background Mode Handling
Handle transcription state when the app goes to the background:
1import { AppState } from 'react-native';
2
3useEffect(() => {
4 const subscription = AppState.addEventListener("change", nextAppState => {
5 if (nextAppState === "background" && isTranscribing) {
6 // Optionally pause or stop transcription when app is in background
7 // Or notify user that transcription continues in background
8 }
9 });
10
11 return () => {
12 subscription.remove();
13 };
14}, [isTranscribing]);
15
Best Practices for React Native Transcription
To ensure the best experience with your React Native speech to text example:
1. Provide Visual Feedback
Always provide clear visual feedback about the transcription state:
1{isTranscribing && (
2 <View style={styles.activeIndicator}>
3 <Text style={styles.activeText}>Transcription Active</Text>
4 <View style={styles.pulseDot} />
5 </View>
6)}
7
2. Handle Errors Gracefully
Implement proper error handling:
1try {
2 startTranscription(config);
3} catch (error) {
4 console.error("Transcription error:", error);
5 // Show user-friendly error message
6 Alert.alert(
7 "Transcription Error",
8 "Unable to start transcription. Please try again."
9 );
10}
11
3. Test in Various Conditions
Test your transcription implementation in different conditions:
- With varying levels of background noise
- With multiple speakers
- With different accents
- With various network conditions
Conclusion
Implementing realtime speech to text in your React Native application using VideoSDK provides a powerful way to enhance accessibility and user experience. VideoSDK's transcription API makes it straightforward to add high-quality speech recognition capabilities to your app without the complexity of building your own solution.
By following this React Native speech to text example, you've learned how to set up the development environment, implement the transcription component, handle various transcription events, and optimize performance. These skills will allow you to create more accessible and feature-rich applications that can respond to voice input in real-time.
Whether you're building a video conferencing app, a voice assistant, or adding accessibility features to an existing application, VideoSDK's transcription capabilities provide a robust foundation for your speech recognition needs.
Additional Resources
For more information on implementing real-time transcription in React Native:
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ