All posts

Which Speech Recognition Model is Best for My Business?

Which Speech Recognition Model is Best for My Business?

A funny thing happened when Deepgram first decided to use end-to-end deep learning (E2EDL) to design our next-generation speech-to-text (STT) solution. We found that this approach was hugely flexible and easier to optimize than traditional STT. We didn't have to reconnect and optimize multiple models (acoustic, pronunciation, and language) every time we wanted to make a change. And we could retrain and enhance our speech models without starting from scratch. With transfer learning, we could build new speech models faster. This trait of our technology has allowed us to build different base speech models for different use cases and needs. It also allows us to tailor models in cases where a customer needs something specific that we don't currently offer. Let's take a look at the two types of models that we offer here at Deepgram and what each is good for.

1. Language-by-Use Case Models

All of our use case-specific models are available in various English dialects. We are expanding into different language-by-use case combinations as we continue to train and optimize our speech models for specific circumstances, such as call centers or meeting transcription, as well as expanding the spoken languages we offer. Our customers have found that combining a spoken language and use case to create a speech model that works specifically for their needs is more accurate than Big Tech's out-of-the-box, one-size-fits-none models. These targeted models have the fastest speed and are optimized for the best scalability. Our models can transcribe one hour of pre-recorded audio in 30 seconds. These models are great for all applications, especially ones that need very high speeds or cost savings for on-prem use. You also don't need to trade off speed or scalability for high accuracy and because we have multiple models for different use cases-unlike Big Tech-our models tend to be more accurate as well.

Newsletter

Get Deepgram news and product updates

2. Higher Accuracy Enhanced Models

We also built our next-generation architecture with the highest English language accuracy on long-tail words or words that are not as common in regular conversations. This new architecture was rebuilt from our current architecture to optimize accuracy on more words. This new enhanced speech model architecture is best suited where you have keywords and terms that you must get correct but are not in normal conversations; like fiduciary, biodiversity, formulae. Some use cases can be Conversational AI for B2B, technical support contact centers, or technical meetings or seminars.

3. Models Tailored for Your Business

But what if we don't have a use case model specifically for your needs? Maybe your audio has a lot of background noise, accents, jargon, or product and company names; all of this can sometimes create problems for off-the-shelf models. If that's the case for you, here at Deepgram we can customize a model for your specific use case. These tailored models can be trained and deployed within weeks and are specifically targeted to address the characteristics of your use case that might make it hard for an off-the-shelf model. To make sure that the tailored model really does address your specific issues, the data for training these models requires audio from your specific business. The more "real world" audio from your business, the better the accuracy. Having an employee read off a script or list of terms creates poor data vs. recording your employee and customer having a conversation. Although we like to say that the more real-world audio you can provide, the better, we've seen good accuracy improvement with less than 10 hours of audio.

Deciding Which ASR Platform is Best for You

There are obviously a lot of factors that go into deciding which ASR system will work best for you, beyond the ability to tailor models. If you'd like to read more the factors that you should consider when shopping for an ASR platform, check out How to Evaluate an ASR Platform, or fill out our free Speech-to-Text Self Assessment. Still have questions? Contact us to talk through your use case and see which of our models is best for you.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

More with these tags:

Share your feedback

Thank you! Can you tell us what you liked about it? (Optional)

Thank you. What could we have done better? (Optional)

We may also want to contact you with updates or questions related to your feedback and our product. If don't mind, you can optionally leave your email address along with your comments.

Thank you!

We appreciate your response.