Get Live Speech Transcriptions In Your Browser

There are so many projects you can build with Deepgram's streaming audio transcriptions. Today, we are going to get live transcriptions from a user's mic inside of your browser.

Watch this tutorial as a video:

Before We Start

For this project, you will need a Deepgram API Key - get one here. That's it in terms of dependencies - this project is entirely browser-based.

Create a new index.html file, open it in a code editor, and add the following boilerplate code:

<!DOCTYPE html>
<html>
  <body>
    <p id="status">Connection status will go here</p>
    <p id="transcript">Deepgram transcript will go here</p>
    <script>
      // Further code goes here
    </script>
  </body>
</html>

Get User Microphone

You can request access to a user's media input devices (microphones and cameras) using a built in getUserMedia() method. If allowed by the user, it will return a MediaStream which we can then prepare to send to Deepgram. Inside of your <script> add the following:

navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
  console.log({ stream })
  // Further code goes here
})

Load your index.html file in your browser, and you should immediately receive a prompt to access your microphone. Grant it, and then look at the console in your developer tools.

The first half of the image shows the browser asking for access to the mic. An arrow with the phrase

Now we have a MediaStream we must provide it to a MediaRecorder which will prepare the data and, once available, emit it with a datavailable event:

const mediaRecorder = new MediaRecorder(stream)

We now have everything we need to send Deepgram.

Connect to Deepgram

To stream audio to Deepgram's Speech Recognition service, we must open a WebSocket connection and send data via it. First, establish the connection:

const socket = new WebSocket('wss://api.deepgram.com/v1/listen', [ 'token', 'YOUR_DEEPGRAM_API_KEY' ])

A reminder that this key is client-side and, therefore, your users can see it. Any user with access to your key can access the Deepgram APIs, which, in turn, may provide full account access. Refer to our post on protecting your API key with browser live transcription.

Then, log when socket onopen, onmessage, onclose, and onerror events are triggered:

socket.onopen = () => {
  console.log({ event: 'onopen' })
}

socket.onmessage = (message) => {
  console.log({ event: 'onmessage', message })
}

socket.onclose = () => {
  console.log({ event: 'onclose' })
}

socket.onerror = (error) => {
  console.log({ event: 'onerror', error })
}

Refresh your browser and watch the console. You should see the socket connection is opened and then closed. To keep the connection open, we must swiftly send some data once the connection is opened.

Sending Data to Deepgram

Inside of the socket.onopen function send data to Deepgram in 250ms increments:

mediaRecorder.addEventListener('dataavailable', event => {
  if (event.data.size > 0 && socket.readyState == 1) {
    socket.send(event.data)
  }
})
mediaRecorder.start(250)

Deepgram isn't fussy about the timeslice you provide (here it's 250ms), but bear in mind that the bigger this number is, the longer between words being spoken and it being sent, slowing down your transcription. 100-250 is ideal.

Take a look at your console now while speaking into your mic - you should be seeing data come back from Deepgram!

The browser console shows four onmessage events. The last one is expanded and shows a JSON object, including a data object. The data object contains the words

Handling the Deepgram Response

Inside of the socket.onmessage function parse the data sent from Deepgram, pull out the transcript only, and determine if it's the final transcript for that phrase ("utterance"):

const received = JSON.parse(message.data)
const transcript = received.channel.alternatives[0].transcript
if (transcript && received.is_final) {
  console.log(transcript)
}

You may have noticed that for each phrase, you have received several messages from Deepgram - each growing by a word (for example "hello", "hello how", "hello how are", etc). Deepgram will send you back data as each word is transcribed, which is great for getting a speedy response. For this simple project, we will only show the final version of each utterance which is denoted by an is_final property in the response.

To neaten this up, remove the console.log({ event: 'onmessage', message }) from this function, and then test your code again.

The terminal shows two phrases written in plain text.

That's it! That's the project. Before we wrap up, let's give the user some indication of progress in the web page itself.

Showing Status & Progress In Browser

Change the text inside of <p id="status"> to 'Not Connected'. Then, at the top of your socket.onopen function add this line:

document.querySelector('#status').textContent = 'Connected'

Remove the text inside of <p id="transcript">. Where you are logging the transcript in your socket.onmessage function add this line:

document.querySelector('#transcript').textContent += transcript + ' '

Try your project once more, and your web page should show you when you're connected and what words you have spoken, thanks to Deepgram's Speech Recognition.

The full code is here:

<!DOCTYPE html>
<html>
  <body>
    <p id="status">Connection status will go here</p>
    <p id="transcript">Deepgram transcript will go here</p>
    <script>
      navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
        const mediaRecorder = new MediaRecorder(stream)
        const socket = new WebSocket('wss://api.deepgram.com/v1/listen', [ 'token', 'YOUR_DEEPGRAM_API_KEY' ])

        socket.onopen = () => {
          console.log({ event: 'onopen' })
          document.querySelector('#status').textContent = 'Connected'
          mediaRecorder.addEventListener('dataavailable', event => {
            if (event.data.size > 0 && socket.readyState == 1) {
              socket.send(event.data)
            }
          })
          mediaRecorder.start(250)
        }

        socket.onmessage = (message) => {
          console.log({ event: 'onmessage', message })
          const received = JSON.parse(message.data)
          const transcript = received.channel.alternatives[0].transcript
          if (transcript && received.is_final) {
            document.querySelector('#transcript').textContent += transcript + ' '
          }
        }

        socket.onclose = () => {
          console.log({ event: 'onclose' })
        }

        socket.onerror = (error) => {
          console.log({ event: 'onerror', error })
        }
      })
    </script>
  </body>
</html>

If you have any questions, please feel free to reach out on Twitter - we're @DeepgramAI.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .