What is realtime speech to text in React Native?

Realtime speech to text in React Native is a feature that converts spoken language into written text as it's being spoken, with minimal delay. It enables seamless voice-to-text conversion during live conversations, enhancing accessibility and providing more intuitive ways for users to interact with mobile applications.

How does VideoSDK handle real-time transcription?

VideoSDK handles real-time transcription through a five-step process: 1) Initiation via the startTranscription method, 2) Resource acquisition from the transcription service, 3) Status updates to the client (starting, started, stopping, stopped), 4) Delivery of transcription text events as speech is detected, and 5) Resource release when stopTranscription is called.

What dependencies are needed to implement speech-to-text with VideoSDK in React Native?

To implement speech-to-text with VideoSDK in React Native, you need to install the following dependencies: @videosdk.live/react-native-sdk (the main VideoSDK package), @videosdk.live/react-native-incallmanager (for call management), and react-native-permissions (for handling microphone access permissions).

What permissions are required for speech recognition in React Native?

For speech recognition in React Native, you need microphone permissions. For iOS, you must add NSMicrophoneUsageDescription to Info.plist. For Android, you need to add the RECORD_AUDIO permission to the AndroidManifest.xml file. Additionally, you need INTERNET permission for sending audio data to the transcription service.

How do I handle transcription state changes in VideoSDK?

You can handle transcription state changes in VideoSDK by implementing the onTranscriptionStateChanged callback function with the useTranscription hook. This function receives data containing status (TRANSCRIPTION_STARTING, TRANSCRIPTION_STARTED, TRANSCRIPTION_STOPPING, or TRANSCRIPTION_STOPPED) and an ID. You can then update your UI accordingly based on these status changes.

What performance optimizations should I consider for React Native transcription?

For React Native transcription performance, consider: 1) Memory management - limit history to prevent memory issues, 2) Efficient rendering - use virtualized lists for long transcriptions, 3) Background mode handling - manage transcription state when the app goes to background, 4) Batched updates - avoid frequent state updates, and 5) Optimize component re-renders by using React's memoization features.

Can VideoSDK's transcription work with external services?

Yes, VideoSDK's transcription can work with external services. You can configure a webhook URL when starting transcription using: config = { webhookUrl: "https://your-webhook-url.com/transcription" }. This allows you to process the transcription data with external services or store it in your database in real-time.

Building with a React Native Speech to Text Example: Achieving Realtime Speech to Text

Q: How can I generate a summary of the transcription with VideoSDK?

You can generate a summary of the transcription by configuring the summary option when starting transcription. Add the following to your configuration: summary: { enabled: true, prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary" }. This will generate a structured summary of the transcribed content, which is especially useful for meeting documentation.

Learn how to implement realtime speech to text in React Native using VideoSDK. This comprehensive guide includes code examples, performance optimizations, and best practices.

Voice interfaces have become an essential part of modern mobile applications, offering intuitive ways for users to interact with your app while enhancing accessibility. Implementing realtime speech to text capabilities in your React Native application can transform user experience by enabling seamless voice-to-text conversion during live conversations.

In this comprehensive guide, we'll build a complete React Native speech to text example using VideoSDK's powerful transcription features. Whether you're developing a video conferencing app, a voice assistant, or adding accessibility features to your existing application, this tutorial will show you how to implement real-time transcription in just a few steps.

Understanding VideoSDK's Realtime Transcription Flow

Before diving into the code, let's understand how VideoSDK handles real-time transcription:

Initiation: Your React Native app initiates transcription using the startTranscription method
Resource Acquisition: VideoSDK's server requests necessary resources from the transcription service
Status Updates: The server sends event updates (starting, started, stopping, stopped)
Transcription Data: As speech is detected, your app receives transcription text events with the converted text
Termination: When you call the stopTranscription method, resources are released

This architecture allows for efficient, low-latency transcription directly integrated with your video calls.

Setting Up Your React Native Project

Let's start by setting up a React Native project with VideoSDK.

Creating a New Project

1# Create a new React Native project
2npx react-native init VideoSDKTranscriptionDemo
3
4# Navigate to the project directory
5cd VideoSDKTranscriptionDemo
6

Installing VideoSDK

Add the VideoSDK React Native package to your project:

1# Install VideoSDK React Native
2npm install @videosdk.live/react-native-sdk
3
4# Install additional required dependencies
5npm install @videosdk.live/react-native-incallmanager react-native-permissions
6

Setting Up Permissions

For speech recognition to work, we need microphone permissions:

For iOS (Info.plist)

Add the following to your ios/YourApp/Info.plist file:

1<key>NSMicrophoneUsageDescription</key>
2<string>This app needs access to your microphone for voice transcription</string>
3<key>NSCameraUsageDescription</key>
4<string>This app needs access to your camera for video calls</string>
5

For Android (AndroidManifest.xml)

Add these permissions to your android/app/src/main/AndroidManifest.xml file:

1<uses-permission android:name="android.permission.RECORD_AUDIO" />
2<uses-permission android:name="android.permission.CAMERA" />
3<uses-permission android:name="android.permission.INTERNET" />
4

Building the Transcription Component

Now, let's create a component that utilizes VideoSDK's real-time transcription capabilities. We'll use the useTranscription hook to access the transcription methods and event handlers.

Create a new file called TranscriptionComponent.js in your project:

1// TranscriptionComponent.js
2import React, { useState, useEffect } from 'react';
3import {
4  View,
5  Text,
6  TouchableOpacity,
7  StyleSheet,
8  FlatList,
9  ActivityIndicator,
10} from 'react-native';
11import { Constants, useTranscription } from "@videosdk.live/react-native-sdk";
12
13const TranscriptionComponent = () => {
14  // State variables to manage transcription
15  const [isTranscribing, setIsTranscribing] = useState(false);
16  const [isStarting, setIsStarting] = useState(false);
17  const [transcriptionText, setTranscriptionText] = useState([]);
18  const [error, setError] = useState(null);
19
20  // Configure event handlers for transcription events
21  const onTranscriptionStateChanged = (data) => {
22    const { status, id } = data;
23
24    if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTING) {
25      console.log("Realtime Transcription is starting", id);
26      setIsStarting(true);
27    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTED) {
28      console.log("Realtime Transcription is started", id);
29      setIsTranscribing(true);
30      setIsStarting(false);
31    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPING) {
32      console.log("Realtime Transcription is stopping", id);
33    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPED) {
34      console.log("Realtime Transcription is stopped", id);
35      setIsTranscribing(false);
36      setIsStarting(false);
37    }
38  };
39
40  // Handle incoming transcription text
41  const onTranscriptionText = (data) => {
42    const { participantId, participantName, text, timestamp, type } = data;
43    console.log(`${participantName}: ${text} ${timestamp}`);
44    
45    // Add new transcription to the list
46    setTranscriptionText(prevTexts => [
47      { 
48        id: timestamp,
49        name: participantName,
50        text: text,
51        timestamp: timestamp
52      },
53      ...prevTexts
54    ]);
55  };
56
57  // Get transcription methods from the hook
58  const { startTranscription, stopTranscription } = useTranscription({
59    onTranscriptionStateChanged,
60    onTranscriptionText,
61  });
62
63  // Function to handle starting transcription
64  const handleStartTranscription = () => {
65    try {
66      // Configuration for realtime transcription
67      const config = {
68        // Optional webhook URL for receiving transcription data externally
69        webhookUrl: null,
70        
71        // Configure summary generation
72        summary: {
73          enabled: true,
74          prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
75        }
76      };
77      
78      // Start transcription with the configuration
79      startTranscription(config);
80    } catch (err) {
81      setError("Failed to start transcription: " + err.message);
82    }
83  };
84
85  // Function to handle stopping transcription
86  const handleStopTranscription = () => {
87    try {
88      stopTranscription();
89    } catch (err) {
90      setError("Failed to stop transcription: " + err.message);
91    }
92  };
93
94  // Format timestamp for display
95  const formatTimestamp = (timestamp) => {
96    const date = new Date(timestamp);
97    return `${date.getHours()}:${date.getMinutes()}:${date.getSeconds()}`;
98  };
99
100  // Render the component
101  return (
102    <View style={styles.container}>
103      <Text style={styles.title}>VideoSDK Realtime Transcription</Text>
104      
105      {/* Transcription control button */}
106      <TouchableOpacity
107        style={[
108          styles.button,
109          isTranscribing ? styles.stopButton : styles.startButton,
110          isStarting && styles.disabledButton
111        ]}
112        onPress={isTranscribing ? handleStopTranscription : handleStartTranscription}
113        disabled={isStarting}
114      >
115        {isStarting ? (
116          <View style={styles.buttonContent}>
117            <ActivityIndicator color="#fff" size="small" />
118            <Text style={styles.buttonText}>Starting Transcription...</Text>
119          </View>
120        ) : (
121          <Text style={styles.buttonText}>
122            {isTranscribing ? "Stop Transcription" : "Start Transcription"}
123          </Text>
124        )}
125      </TouchableOpacity>
126      
127      {/* Error message display */}
128      {error && (
129        <Text style={styles.errorText}>{error}</Text>
130      )}
131      
132      {/* Transcription display */}
133      <View style={styles.transcriptionContainer}>
134        <Text style={styles.sectionTitle}>
135          {isTranscribing ? "Live Transcription" : "Transcription History"}
136        </Text>
137        
138        {transcriptionText.length === 0 ? (
139          <Text style={styles.emptyText}>
140            {isTranscribing 
141              ? "Waiting for speech..." 
142              : "No transcription history yet. Start transcription to begin."}
143          </Text>
144        ) : (
145          <FlatList
146            data={transcriptionText}
147            keyExtractor={(item) => item.id.toString()}
148            renderItem={({ item }) => (
149              <View style={styles.transcriptionItem}>
150                <View style={styles.transcriptionHeader}>
151                  <Text style={styles.speakerName}>{item.name}</Text>
152                  <Text style={styles.timestamp}>
153                    {formatTimestamp(item.timestamp)}
154                  </Text>
155                </View>
156                <Text style={styles.transcriptionText}>{item.text}</Text>
157              </View>
158            )}
159            style={styles.transcriptionList}
160          />
161        )}
162      </View>
163    </View>
164  );
165};
166
167// Component styles
168const styles = StyleSheet.create({
169  container: {
170    flex: 1,
171    padding: 20,
172    backgroundColor: '#f5f7fa',
173  },
174  title: {
175    fontSize: 24,
176    fontWeight: 'bold',
177    marginBottom: 20,
178    color: '#333',
179    textAlign: 'center',
180  },
181  button: {
182    padding: 15,
183    borderRadius: 8,
184    alignItems: 'center',
185    marginBottom: 20,
186  },
187  buttonContent: {
188    flexDirection: 'row',
189    alignItems: 'center',
190    justifyContent: 'center',
191  },
192  startButton: {
193    backgroundColor: '#4CAF50',
194  },
195  stopButton: {
196    backgroundColor: '#F44336',
197  },
198  disabledButton: {
199    backgroundColor: '#9E9E9E',
200  },
201  buttonText: {
202    color: 'white',
203    fontWeight: 'bold',
204    fontSize: 16,
205    marginLeft: 8,
206  },
207  errorText: {
208    color: '#F44336',
209    marginBottom: 10,
210  },
211  transcriptionContainer: {
212    flex: 1,
213    backgroundColor: 'white',
214    borderRadius: 8,
215    padding: 15,
216    shadowColor: '#000',
217    shadowOffset: { width: 0, height: 2 },
218    shadowOpacity: 0.1,
219    shadowRadius: 4,
220    elevation: 2,
221  },
222  sectionTitle: {
223    fontSize: 18,
224    fontWeight: 'bold',
225    marginBottom: 10,
226    color: '#333',
227  },
228  emptyText: {
229    textAlign: 'center',
230    color: '#666',
231    marginTop: 20,
232  },
233  transcriptionList: {
234    flex: 1,
235  },
236  transcriptionItem: {
237    borderBottomWidth: 1,
238    borderBottomColor: '#f0f0f0',
239    paddingVertical: 10,
240  },
241  transcriptionHeader: {
242    flexDirection: 'row',
243    justifyContent: 'space-between',
244    marginBottom: 5,
245  },
246  speakerName: {
247    fontWeight: 'bold',
248    color: '#2196F3',
249  },
250  timestamp: {
251    color: '#9E9E9E',
252    fontSize: 12,
253  },
254  transcriptionText: {
255    fontSize: 16,
256    color: '#333',
257  },
258});
259
260export default TranscriptionComponent;
261

Implementing Real-Time Transcription Using VideoSDK's Hooks

Let's create a simpler example that focuses specifically on VideoSDK's transcription capabilities, following the documentation you provided:

1import React, { useState } from 'react';
2import { 
3  View, 
4  Text, 
5  TouchableOpacity, 
6  StyleSheet, 
7  ScrollView 
8} from 'react-native';
9import { Constants, useTranscription } from "@videosdk.live/react-native-sdk";
10
11const TranscriptionDemo = () => {
12  const [transcriptions, setTranscriptions] = useState([]);
13  const [isTranscribing, setIsTranscribing] = useState(false);
14  
15  // Configure transcription event handlers
16  function onTranscriptionStateChanged(data) {
17    const { status, id } = data;
18
19    if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTING) {
20      console.log("Realtime Transcription is starting", id);
21    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STARTED) {
22      console.log("Realtime Transcription is started", id);
23      setIsTranscribing(true);
24    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPING) {
25      console.log("Realtime Transcription is stopping", id);
26    } else if (status === Constants.transcriptionEvents.TRANSCRIPTION_STOPPED) {
27      console.log("Realtime Transcription is stopped", id);
28      setIsTranscribing(false);
29    }
30  }
31
32  // Handle incoming transcription text
33  function onTranscriptionText(data) {
34    let { participantId, participantName, text, timestamp, type } = data;
35    console.log(`${participantName}: ${text} ${timestamp}`);
36    
37    // Add to transcriptions list
38    setTranscriptions(prev => [...prev, {
39      id: Date.now().toString(),
40      name: participantName,
41      text,
42      timestamp
43    }]);
44  }
45
46  // Get transcription methods from the hook
47  const { startTranscription, stopTranscription } = useTranscription({
48    onTranscriptionStateChanged,
49    onTranscriptionText,
50  });
51
52  // Start transcription with configuration
53  const handleStartTranscription = () => {
54    // Configuration for realtime transcription
55    const config = {
56      webhookUrl: null, // Optional webhook URL
57      summary: {
58        enabled: true,
59        prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
60      }
61    };
62    
63    startTranscription(config);
64  };
65
66  // Stop ongoing transcription
67  const handleStopTranscription = () => {
68    stopTranscription();
69  };
70
71  return (
72    <View style={styles.container}>
73      <Text style={styles.title}>VideoSDK Realtime Transcription</Text>
74      
75      {/* Control buttons */}
76      <View style={styles.controls}>
77        <TouchableOpacity 
78          style={[styles.button, !isTranscribing ? styles.active : styles.inactive]} 
79          onPress={handleStartTranscription}
80          disabled={isTranscribing}
81        >
82          <Text style={styles.buttonText}>Start Transcription</Text>
83        </TouchableOpacity>
84        
85        <TouchableOpacity 
86          style={[styles.button, isTranscribing ? styles.active : styles.inactive]} 
87          onPress={handleStopTranscription}
88          disabled={!isTranscribing}
89        >
90          <Text style={styles.buttonText}>Stop Transcription</Text>
91        </TouchableOpacity>
92      </View>
93      
94      {/* Transcription display */}
95      <View style={styles.transcriptionContainer}>
96        <Text style={styles.sectionHeader}>Transcription:</Text>
97        
98        {transcriptions.length === 0 ? (
99          <Text style={styles.emptyMessage}>No transcription data yet. Start transcription and speak to see results.</Text>
100        ) : (
101          <ScrollView style={styles.transcriptionScroll}>
102            {transcriptions.map(item => (
103              <View key={item.id} style={styles.transcriptionItem}>
104                <Text style={styles.speakerName}>{item.name}:</Text>
105                <Text style={styles.transcriptionText}>{item.text}</Text>
106              </View>
107            ))}
108          </ScrollView>
109        )}
110      </View>
111    </View>
112  );
113};
114
115const styles = StyleSheet.create({
116  container: {
117    flex: 1,
118    padding: 20,
119    backgroundColor: '#f5f5f5',
120  },
121  title: {
122    fontSize: 22,
123    fontWeight: 'bold',
124    textAlign: 'center',
125    marginBottom: 20,
126  },
127  controls: {
128    flexDirection: 'row',
129    justifyContent: 'space-around',
130    marginBottom: 20,
131  },
132  button: {
133    paddingVertical: 12,
134    paddingHorizontal: 20,
135    borderRadius: 8,
136    width: '45%',
137    alignItems: 'center',
138  },
139  active: {
140    backgroundColor: '#2196F3',
141  },
142  inactive: {
143    backgroundColor: '#B0BEC5',
144  },
145  buttonText: {
146    color: 'white',
147    fontWeight: 'bold',
148  },
149  transcriptionContainer: {
150    flex: 1,
151    backgroundColor: 'white',
152    borderRadius: 8,
153    padding: 16,
154    shadowColor: '#000',
155    shadowOffset: { width: 0, height: 2 },
156    shadowOpacity: 0.1,
157    shadowRadius: 4,
158    elevation: 3,
159  },
160  sectionHeader: {
161    fontSize: 18,
162    fontWeight: 'bold',
163    marginBottom: 12,
164  },
165  emptyMessage: {
166    textAlign: 'center',
167    color: '#757575',
168    fontStyle: 'italic',
169    marginTop: 30,
170  },
171  transcriptionScroll: {
172    flex: 1,
173  },
174  transcriptionItem: {
175    marginBottom: 10,
176    borderBottomWidth: 1,
177    borderBottomColor: '#f0f0f0',
178    paddingBottom: 10,
179  },
180  speakerName: {
181    fontWeight: 'bold',
182    color: '#2196F3',
183  },
184  transcriptionText: {
185    fontSize: 16,
186    color: '#333',
187    marginTop: 4,
188  },
189});
190
191export default TranscriptionDemo;
192

Using the Transcription Component in a Meeting Context

To use the transcription component within a meeting context, you need to wrap it with the MeetingProvider component from VideoSDK:

1import React from 'react';
2import { View, StyleSheet, SafeAreaView } from 'react-native';
3import { MeetingProvider } from '@videosdk.live/react-native-sdk';
4import TranscriptionDemo from './TranscriptionDemo';
5
6const App = () => {
7  // Replace with your VideoSDK token and meeting ID
8  const token = "YOUR_VIDEOSDK_TOKEN";
9  const meetingId = "YOUR_MEETING_ID";
10
11  return (
12    <SafeAreaView style={styles.container}>
13      <MeetingProvider
14        config={{
15          meetingId,
16          micEnabled: true,
17          webcamEnabled: true,
18          name: "Test User",
19          participantId: "participant-id",
20          token
21        }}
22      >
23        <TranscriptionDemo />
24      </MeetingProvider>
25    </SafeAreaView>
26  );
27};
28
29const styles = StyleSheet.create({
30  container: {
31    flex: 1,
32    backgroundColor: '#f5f5f5',
33  }
34});
35
36export default App;
37

Advanced Transcription Features

VideoSDK offers several advanced features to enhance your realtime speech to text implementation.

1. Configuring Summary Generation

You can configure the transcription to generate a summary after the meeting:

1const config = {
2  webhookUrl: null,
3  summary: {
4    enabled: true,
5    prompt: "Write summary in sections like Title, Agenda, Speakers, Action Items, Outlines, Notes and Summary"
6  }
7};
8
9startTranscription(config);
10

This will generate a structured summary of the transcribed content, which can be especially useful for meeting documentation.

2. Handling Transcription Events

VideoSDK provides a comprehensive set of events to track the transcription process:

1// Constants for transcription events
2const {
3  TRANSCRIPTION_STARTING,
4  TRANSCRIPTION_STARTED,
5  TRANSCRIPTION_STOPPING,
6  TRANSCRIPTION_STOPPED
7} = Constants.transcriptionEvents;
8
9function onTranscriptionStateChanged(data) {
10  const { status, id } = data;
11  
12  switch(status) {
13    case TRANSCRIPTION_STARTING:
14      console.log("Transcription is starting");
15      // Show loading indicator
16      break;
17    case TRANSCRIPTION_STARTED:
18      console.log("Transcription has started");
19      // Show active transcription UI
20      break;
21    case TRANSCRIPTION_STOPPING:
22      console.log("Transcription is stopping");
23      // Show stopping indicator
24      break;
25    case TRANSCRIPTION_STOPPED:
26      console.log("Transcription has stopped");
27      // Update UI to show transcription is inactive
28      break;
29    default:
30      console.log("Unknown transcription status");
31  }
32}
33

3. Sending Transcription to External Services

You can configure a webhook URL to receive the transcription data in real-time:

1const config = {
2  webhookUrl: "https://your-webhook-url.com/transcription",
3  summary: { enabled: true }
4};
5
6startTranscription(config);
7

This allows you to process the transcription data with external services or store it in your database.

Performance Considerations

When implementing real-time speech to text in React Native, consider these performance optimizations:

1. Memory Management

For long meetings, implement a limit on the transcription history to prevent memory issues:

1// Limit transcription history to last 100 items
2useEffect(() => {
3  if (transcriptions.length > 100) {
4    setTranscriptions(transcriptions.slice(-100));
5  }
6}, [transcriptions]);
7

2. Efficient Rendering

Use virtualized lists for rendering long transcriptions:

1import { FlatList } from 'react-native';
2
3<FlatList
4  data={transcriptions}
5  keyExtractor={item => item.id}
6  renderItem={({ item }) => (
7    <View style={styles.item}>
8      <Text>{item.text}</Text>
9    </View>
10  )}
11  initialNumToRender={10}
12  maxToRenderPerBatch={10}
13  windowSize={5}
14/>
15

3. Background Mode Handling

Handle transcription state when the app goes to the background:

1import { AppState } from 'react-native';
2
3useEffect(() => {
4  const subscription = AppState.addEventListener("change", nextAppState => {
5    if (nextAppState === "background" && isTranscribing) {
6      // Optionally pause or stop transcription when app is in background
7      // Or notify user that transcription continues in background
8    }
9  });
10
11  return () => {
12    subscription.remove();
13  };
14}, [isTranscribing]);
15

Best Practices for React Native Transcription

To ensure the best experience with your React Native speech to text example:

1. Provide Visual Feedback

Always provide clear visual feedback about the transcription state:

1{isTranscribing && (
2  <View style={styles.activeIndicator}>
3    <Text style={styles.activeText}>Transcription Active</Text>
4    <View style={styles.pulseDot} />
5  </View>
6)}
7

2. Handle Errors Gracefully

Implement proper error handling:

1try {
2  startTranscription(config);
3} catch (error) {
4  console.error("Transcription error:", error);
5  // Show user-friendly error message
6  Alert.alert(
7    "Transcription Error",
8    "Unable to start transcription. Please try again."
9  );
10}
11

3. Test in Various Conditions

Test your transcription implementation in different conditions:

With varying levels of background noise
With multiple speakers
With different accents
With various network conditions

Conclusion

Implementing realtime speech to text in your React Native application using VideoSDK provides a powerful way to enhance accessibility and user experience. VideoSDK's transcription API makes it straightforward to add high-quality speech recognition capabilities to your app without the complexity of building your own solution.

By following this React Native speech to text example, you've learned how to set up the development environment, implement the transcription component, handle various transcription events, and optimize performance. These skills will allow you to create more accessible and feature-rich applications that can respond to voice input in real-time.

Whether you're building a video conferencing app, a voice assistant, or adding accessibility features to an existing application, VideoSDK's transcription capabilities provide a robust foundation for your speech recognition needs.

Additional Resources

For more information on implementing real-time transcription in React Native:

Get 10,000 Free Minutes Every Months

No credit card required to start.

Want to level-up your learning? Subscribe now

Subscribe to our newsletter for more tech based insights

FAQ

Free 10,000 minutes for video calls

RELEVANT BLOGS