All posts

Creating Speaker-Labeled Transcripts With Google Colab

Creating Speaker-Labeled Transcripts With Google Colab

(Note: If you’re one of those tinker first, read later sorts of folks, dive right into The Python Notebook covered in this article.)

Are you creating a podcast? Do you have multi-person Zoom calls? Or perhaps even earnings calls to get to?

Well that’s a lot of information to keep up with. And unless you have a very good notetaker on all of those calls, it becomes extremely important that you keep a record of your discussions somewhere.

That’s where speech-to-text (STT) technology comes in. But be careful. Many STT resources out there are extremely limited. And most don’t even offer speaker-labeling as a feature.

That is, a shoddy STT application will only produce a transcript that looks like this:

Hey, did you know that elk meat tastes really good? Really? Oh! I heard about elk meat too. I think I heard that on a podcast once. Which podcast? We probably heard the same podcast. I think so. What podcast are you guys talking about? Lemme look it up

Instead of something that looks like this:

Speaker 1: Hey, did you know that elk meat tastes really good?

Speaker 2: Really?

Speaker 3: Oh! I heard about elk meat too. I think I heard that on a podcast once.

Speaker 2: Which podcast?

Speaker 1: We probably heard the same podcast.

Speaker 3: I think so.

Speaker 2: What podcast are you guys talking about?

Speaker 1: Lemme look it up

Well luckily, Deepgram is here to help! Not only do we offer top-notch speaker-labeling (aka “diarization”) services, but we also have a handy-dandy notebook to help you out! That way, you don’t have to worry about writing any code. You can just upload your audios into the notebook, and run the code that was already written for you.

Ready? Let’s go!

The Python notebook

All the instructions you need are inside the notebook itself: here.

However, it can be helpful to break things down piece-by-piece. So let’s do that here. The first cell you’ll run into is the “Dependencies” cell (image below). By clicking the cell’s play-button, you’ll install all the fancy-schmancy coding packages you need for the rest of the cells to run.

After all, you can’t transcribe audios with Deepgram’s AI models without first installing Deepgram itself.

! pip install requests ffmpeg-python
! pip install deepgram-sdk --upgrade

Up next, we have a cell to remind you to upload the audio of your choice into the notebook. There is a menu on the left-hand side of the screen where you can upload any audio files you wish. To upload, simply click the icon of the paper with the upwards-facing arrow on it. It will take a few moments for the audio to appear, but once it does, move onto the next cell.

And now, here’s the fun part:

from deepgram import Deepgram
import asyncio, json, os

'''
 Sign up at https://dpgr.am/7407694
 to get an API key and 45,000 minutes
 for free!
'''
dg_key = '🔑🔑🔑 Your API Key here! 🔑🔑🔑 '
dg = Deepgram(dg_key)

'''
The most common audio formats and encodings we support 
include mp3, mp4, mp2, aac, wav, flac, pcm, m4a, ogg, opus, and webm,
So feel free to adjust the `MIMETYPE` variable as needed
'''
MIMETYPE = 'm4a'

#Note: You can use '.' if your audio is in the root
DIRECTORY = '.'  


# Feel free to modify your model's parameters as you wish!
options = {
    "punctuate": True,
    "diarize": True,
    "model": 'general',
    "tier": 'nova'
}

#This function is what calls on the model to transcribe
def main():
    audio_folder = os.listdir(DIRECTORY)
    for audio_file in audio_folder:
        if audio_file.endswith(MIMETYPE):
          with open(f"{DIRECTORY}/{audio_file}", "rb") as f:
              source = {"buffer": f, "mimetype":'audio/'+MIMETYPE}
              res = dg.transcription.sync_prerecorded(source, options)
              with open(f"./{audio_file[:-4]}.json", "w") as transcript:
                  json.dump(res, transcript, indent=4)
    return

main()

Here, you’ll see a bunch of variables you need to modify. Specifically, you’ll need to change the following:

  • dg_key should be set to your Deepgram API key

  • MIMETYPE should be set to the file type of the audio you uploaded—whether that’s .wav or .mp3 or some other type of audio file

  • DIRECTORY should be set to the folder that contains all the audios you uploaded. If you didn’t create a new folder in the previous step (that is, if you simply followed the instructions on the previous step and didn’t do any extra work), you can just leave this as it is: ’.’ 

If you run this cell and wait a few moments, you should see a .json file appear in the same place you uploaded your audio files. The code was written such that all the files in the directory specified by DIRECTORY will be transcribed, as long as they end in the mimetype specified by MIMETYPE.

Note that there will be a bit of a delay between when the cell finishes running and when your .json appears. This is normal. Depending on the size of your file, it may take a bit longer than anticipated, but usually it takes less than a minute to see your .json!

Those JSONs, by the way, should look something like this:

Now, the JSON contains all the information you need to create a diarized transcript. But we already went ahead and wrote the code that does that for you. It’s in the next cell, and it looks like this:

'''
The JSON is loaded with information, but if you just want to read the
transcript, run the code below!

One .txt file will be generated per JSON; this .txt file will contain
the diarized, human-readable transcript.
'''

TAG = 'SPEAKER '

def create_transcript(output_json, output_transcript):
  lines = []
  with open(output_json, "r") as file:
    words = json.load(file)["results"]["channels"][0]["alternatives"][0]["words"]
    curr_speaker = 0
    curr_line = ''
    for word_struct in words:
      word_speaker = word_struct["speaker"]
      word = word_struct["punctuated_word"]
      if word_speaker == curr_speaker:
        curr_line += ' ' + word
      else:
        tag = TAG + str(curr_speaker) + ':'
        full_line = tag + curr_line + '\n'
        curr_speaker = word_speaker
        lines.append(full_line)
        curr_line = ' ' + word
    lines.append(TAG + str(curr_speaker) + ':' + curr_line)
    with open(output_transcript, 'w') as f:
      for line in lines:
        f.write(line)
        f.write('\n')
  return

def print_transcript():
  for filename in os.listdir(DIRECTORY):
    if filename.endswith('.json'):
      output_transcript = os.path.splitext(filename)[0] + '.txt'
      create_transcript(filename, output_transcript)

print_transcript()

Running this cell should return a .txt file in the same folder as your audios and your JSONs.  The result should look like this! (Skip to 2:45)

And that’s it! You now have access to code that will turn any audio file you wish into a speaker-labeled (read: diarized) transcript! And if you want to go the extra mile, you can totally summarize this transcript or translate it into a different language. Really, the sky’s the limit.

So go forth and make that podcast! Hop on the earnings call! Hold that Zoom webinar! If you want those recordings transcribed, labeled, and wrapped with a cute little bow, Deepgram is here to help

Keep an eye out for many more notebooks to come! And if you want to check out Deepgram without having to look at any code at all, check out our Playground. There, you can see exactly what we have to offer. Trust me, it’s quite a lot

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

More with these tags:

Share your feedback

Thank you! Can you tell us what you liked about it? (Optional)

Thank you. What could we have done better? (Optional)

We may also want to contact you with updates or questions related to your feedback and our product. If don't mind, you can optionally leave your email address along with your comments.

Thank you!

We appreciate your response.