Celebrating Black History Month with a Vision of More Inclusive Speech Recognition

February is Black History Month! To mark this time of celebration and reflection, we'd like to shine a spotlight on one of the major dialects of American English, African American Vernacular English (AAVE), and share our perspective on its status in speech recognition.

What Exactly is AAVE?

Historically also known as Ebonics, AAVE is one of several dozen prominent varieties of American English, along with regional varieties like Midland American English and Southern American English, and ethno-cultural varieties like the dialects of Latino American English. AAVE's origins reach back to the earliest communities of Black people in North America, enslaved and free, who drew influences from African languages and mixed languages called creoles while building language communities oriented by necessity around English. AAVE developed in Black communities through the centuries - through slavery and emancipation, reconstruction and Jim Crow, the Great Migration and Harlem Renaissance, the Civil Rights Era, and into the present day.

Yet throughout the course of its development, AAVE has never existed in isolation. It has been mutually influenced by so-called Standard American English (SAE), a term which describes a "neutral" or "newscaster" dialect of American English, as well as other regional and community-based dialects. Today's AAVE is far from homogenous. In fact, quite the opposite. AAVE is the term that linguists and language researchers use to talk about a series of accents and dialects spoken predominantly by African Americans in the United States. The grouping is highly diverse, with many regional and community-based variations.

That said, not every Black person in the US speaks in a way that linguists would describe as AAVE. Rather, AAVE refers to a specific combination of semantic, syntactic, and phonological features that are widely used in many but not all Black communities. Language scientists recognize AAVE as a fully-formed linguistic system that is distinct from SAE in its grammar and vocabulary. Speakers of AAVE are consistently able to identify whether a phrase is correct or incorrect AAVE. This ability of the speech community to identify "well formedness" is a crucial linguistic litmus test that speaks to the stability and consistency of AAVE as a dialect. Among other things, AAVE grammar is linguistically noteworthy for its distinctive uses of verbal aspect and its tendency for copula deletion. We must also confront a sad history of AAVE being dismissed as "slang," "uneducated," or otherwise deficient. Such characterizations were frequently used to promote racist perspectives and are inconsistent with our scientific understanding of AAVE as simply one dialect among many varieties of American English. AAVE in all its complexity is the subject of ongoing research by academics in linguistics, cultural studies, and other fields.

Code-Switching Poses Challenges for ASR

More importantly though, to its speakers, AAVE is a medium for daily life. It's a way of talking with friends and family as well as a medium for music, poetry, and other art forms. AAVE's use by Black music artists in particular has brought it to a truly global stage and vaulted it into the American cultural mainstream. Many AAVE speakers are equally comfortable expressing themselves in AAVE and SAE, and may switch between the two from conversation to conversation or phrase to phrase, depending on the social context in which they are speaking.

The phenomenon of speakers switching between dialects is called code-switching and is an interesting challenge for speech recognition because it requires the system to properly identify and process each dialect as seamlessly as speakers talk. Code-switching takes place not only between dialects but also between languages. For example, Deepgram has done extensive work to improve ASR in situations where speakers code-switch between Spanish and English in the course of a conversation, as is common for many bilingual speakers.

ASR Needs to Evolve Beyond SAE

To date, speech recognition technology has focused on Standard American English. SAE occupies a privileged status that linguists refer to as a "prestige dialect"-that is, a dialect that members of a language community perceive, on the whole, as the most prestigious. SAE functions as the primary dialect of business, government, media, and formal education in the United States. These circumstances have led speech recognition companies to focus on serving the needs of SAE speech communities first. As a result, today's speech recognition systems return a higher error rate when transcribing AAVE and other dialects as compared to SAE, and are poorly equipped to handle situations where speakers code-switch between AAVE and SAE.

The consequence is that speakers of AAVE and other dialects may need to code-switch into SAE or "sound white" to be better understood by speech recognition systems. This must change. First and foremost, we see it as ethically important to make speech recognition work for an ever-expanding circle of dialect speakers, including AAVE speakers. Furthermore, we believe that no company in the speech recognition space will be successful with an exclusive focus on prestige dialects like SAE. Makers of speech technologies must set their sights on learning to process the full, rich spectrum of human speech. Deepgram is committed to improving accuracy for AAVE. We hope other speech companies will join us in that effort. For more information on AAVE, check out some of our favorite voices on the subject:

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .