Who Explains the Most? An Analysis of Educational YouTubers

What color is a mirror? Where did the name Tiffany come from? What does it take to become an expert at something?

Well, if you’ve been on what one might call the “Explainer” side of YouTube, the answers to these questions may already be laying dormant in the back of your mind.

Creators like VSauce, CGP Grey, and Veritasium make in-depth videos that not only give their audience the straightforward answers to these questions, but also reveal eye-opening, and sometimes jaw-dropping information that you didn’t know you didn’t know.

If you haven’t yet explored ExplainerTube, it’s a deep rabbit hole. Even if the content just rehashes what you already know, you’ll at the very least leave entertained. Creators in this space cover a lot of ground, from niche topics in logistics to emerging areas of physics research. We’ll be analyzing the output of the following channels:

And while this list isn’t exhaustive, you’ll find enough content amongst their channels to keep you busy for days, if not weeks.

For our purposes here, we’re only focusing on the ExplainerTube channels that have exactly one host. Other channels like TED-Ed, Wisecrack, Numberphile, and CrashCourse have multiple hosts and writers in front of the camera.

It's important for a creator to make their presentation as interesting as possible, especially when explaining a topic with lots of technical details and subtleties. Sometimes, making an engaging presentation requires some really cool animations. Other times, creators make use of props to illustrate their point.

To keep their audience engaged, ExplainTubers (and YouTubers in general) tend to speak pretty quickly. Or, at least, faster than you would in a typical conversation.

Occasionally, they speak so fast that the first sound played in the video are the beeps of a racetrack’s “Ready, Set, Go!” light.

Now, don’t get me wrong, speaking fast is not a sin. Rarely does a professional ExplainTuber speak so quickly that their words become muddled. The folks listed above are sufficiently professional to speak their scripts clearly and straightforwardly explain their message.

So here’s a fun question: Out of all these fast-talking educators, who speaks the quickest of them all? The rest of this article will explain how we found that out, but if you’re just looking for the TL;DR version, look no further than the chart below.

Curious to learn more? Let’s dive in.

Parsing Youtube Explainers With Python: Step by Step

Step 1: Gather the audio

We’re going to focus on the six single-host YouTube channels listed above. To get an analysis that’s as up-to-date as possible, we’ll take a look at each of the channel’s ten most recent videos, not counting one-off and non-explanatory videos.

I downloaded the audio from these videos using the youtube_dl library, a topic we've covered before. The code which completes Step 1 looks a little like this:

PythonVids = ['URL to desired video here', ... ]


ydl_opts = {
   'format': 'bestaudio/best',
   'postprocessors': [{
       'key': 'FFmpegExtractAudio',
       'preferredcodec': 'mp3',
       'preferredquality': '192',
   }],
   # change this to change where you download it to
   'outtmpl': './tom/audio/%(title)s.mp3',
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
   ydl.download(vids)
   print()

Step 2: Turn the audio into a transcript

With a sprinkle of Deepgram’s API, a dash of our Python SDK, and a few sprigs of ffmpeg, we can turn our files of audio into beautiful, parseable text.

Thankfully, we don’t have to write this audio parsing code from scratch. The SDK gives us a wonderful start. After a quick read, we see that transcribing these (relatively) long videos boils down to a classic async/await paradigm. And once we know that, we can simply do a little Ctrl+C, Ctrl+V action, and we end up with this:

Pythonfrom deepgram import Deepgram
import asyncio, json, os


dg_key = ‘Your key goes here’
dg = Deepgram(dg_key)


options = {
   "diarize": True,
   "punctuate": True,
   "paragraphs": True,
   "model": 'general',
   "tier": ‘enhanced’
}


async def main():
   podcasts = os.listdir("./tom/audio")
   for podcast in podcasts:
       print("Currently processing:", podcast)
       with open(f"tom/audio/{podcast}", "rb") as audio:
           source = {"buffer": audio, "mimetype":'audio/mp3'}
           res = await dg.transcription.prerecorded(source, options)
           with open(f"tom/transcripts/{podcast[:-4]}.json", "w") as transcript:
               # print(transcript)
               json.dump(res, transcript)
       print()
   return


asyncio.run(main())

By the way, if you need help configuring your API Key and keeping your Secrets a secret, the SDK provides a guide for that on github, too 😉

Pro tip: Check out our Github Discussions page to speak with other Deepgram users as well as our own team for more in-depth conversations about Speech Recognition, general AI, or even overall dev tips/tricks.

Step 3: Give your transcripts a makeover

The output of step 2 are files that contain machine-readable, developer-friendly json dump. For this step, we’re going to transform that dump into a work of art. That is, we are going to make these transcripts as human-readable as possible.

This is the part where we label the speakers in the video by name. That way, when we tabulate word counts in the next step, we won’t account for guest appearances or cameos.

(Note: I was lazy, so I labeled every guest and cameo as a number like “1” or “2” instead of by their actual name.)

By the time you run the following code, you’ll end up with pretty transcripts that anybody can read:

Pythonimport json
import os


# create transcripts
def create_transcripts():
   print('running create_transcripts')
   for filename in os.listdir("tom/transcripts"):
       with open(f"tom/transcripts/{filename}", "r") as file:
           transcript = json.load(file)
       paragraphs = transcript["results"]["channels"][0]["alternatives"][0]["paragraphs"]
       print(paragraphs['transcript'])
       with open(f"tom/pretty_scripts/{filename[:-5]}.txt", "w") as f:
           for line in paragraphs['transcript']:
               f.write(line)


'''
This function gives you the ability to label your speakers by name.
When diarizing, the Deepgram API will label the speakers as
Speaker 0, Speaker 1, Speaker 2, etc.


When this function is run, you'll one line from the transcript
for each individual speaker that the API identified during diarization.


You will then label the speaker of that line with the name that you desire.
'''
def assign_speakers():
   for filename in os.listdir("tom/pretty_scripts"):
       print(f"Current File: {filename}")
       with open(f"tom/pretty_scripts/{filename}", "r") as f:
           lines = f.readlines()
       spoken = []
       names = []
       for line in lines:
           if line.startswith("Speaker "):
               if line[0:9] in spoken:
                   continue
               print(line)
               name = input("Who is the Speaker?")
               if len(name) <= 1:
                   continue
               spoken.append(line[:9])
               names.append(name)
       print(spoken)
       print(names)
       filedata = "\n".join(lines)
       print(filedata)
       for speaker, name in zip(spoken, names):
           filedata = filedata.replace(speaker, name)
       with open(f"tom/pretty_scripts/{filename}", "w") as f:
           f.write(filedata)


create_transcripts()
assign_speakers()

Step 4: Run the calculations

Thankfully, the calculations in this case are relatively simple. I measured the number of words per minute each YouTuber spoke (again, ignoring guest-stars). That is, for each video, I ran the following equation:

talking speed = words spoken / video length (in minutes)

The videos I chose were also chosen with the intention of minimizing the amount of time guests spoke. I’m therefore treating the amount of guest-speaker time as negligible. This may be a naive assumption. But hey, if Bayes taught me anything, it’s that such naive assumptions in AI can still lead to good results.

So, if we do this calculation for every video, for every YouTuber, and average them out, we can get results that look like this:

ExplainTuber	Average Words Per Minute
Wendover Productions	185.09
Tom Scott	179.16
CGP Grey	177.96
Veritasium	167.06
VSauce	154.69
NerdWriter1	137.65

By the way, the code for grabbing the word counts looks like this:

Pythonimport json
import os


youtuber = 'tom'
main_char = 'tom:'
others = ['1:', '2:', '3:', '4:', '5:', '6:', '7:', '8:', '9:', '0:']
words_per_minute = {}


#Calculate the word spoken only by the speaker we're currently analyzing. No guests.
def calculate_wpms():
   print('calculating wpms')
   for filename in os.listdir("tom/pretty_scripts"):
       with open(f"tom/pretty_scripts/{filename}", "r") as file:
           word_count = 0
           should_count = False
           for line in file:
               words = line.split()
               if len(words) > 0:
                   if words[0] in others:
                       should_count = False
                   elif words[0] == main_char:
                       should_count = True
                       word_count += len(words) - 1
                   else:
                       if should_count:
                           word_count += len(words)
           print('word count: ', word_count)
           words_per_minute[filename] = word_count
   with open(f"tom/words_per_minute.txt", 'w') as results:
       for title, word_count in words_per_minute.items():
               result = title[:-4] + '@' + str(word_count)
               results.write(result)
               results.write('\n')


calculate_wpms()
print(words_per_minute)

The input is the labeled transcripts from Step 3. The output is an @-separated .txt file that contains video titles and their respective host’s word-count.

The output file for Tom Scott’s videos, for example, looks like this:

The number after the @ is the number of words that the ExplainTuber speaks in that particular video. For instance, in the video “These chickens save lives”, Tom says 636 words.

The reason to use an “@” symbol to separate the data is that many YouTube videos have a variety of punctuation in the title. If we were to follow convention by separating the title and word-counts with commas videos with titles like “Cheap, renewable, clean energy. There’s just one problem” would become difficult to format and parse. The “@” symbol is not found in the titles of the videos we analyzed, so it makes the perfect separator.

Okay, we have our answer… Now what?

Let’s bring things to a larger point: We now have an approximation of which ExplainTuber speaks the most rapidly. But is that all? What can we do with this data?

Well, let’s see if there’s any correlation between the number of subscribers and talking speed:

There doesn’t seem to be much. Eyeballing it, there’s a slight negative correlation. But okay, what if we normalize for time? Is there any correlation between talking speed and average subscribers-per-year? Using data from YouTube about how many new subscribers each channel gets on average every year, we find a very loose correlation between how fast you talk and how many new subscribers you get, though there is a lot of variation.

Again, eyeballing it, there doesn’t seem to be a strong correlation. However, if you’d like to analyze any of your favorite YouTubers, feel free to access the code as a Google Colab here, or check out this Github repository. (Just be sure to change any filenames and paths to whatever is appropriate for your needs) Or, if you’d rather just play around with the data I’ve already calculated, check out the numbers here.

But hey, if I learned anything from writing this blog post it’s this: If I ever start posting regularly to a YouTube channel, maybe speaking at a rabbit’s pace wouldn’t be such a bad idea… 🐇

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .