All posts

How Alt Text Made the Web Legible to AI

How Alt Text Made the Web Legible to AI

After months of unsuccessful negotiations, the National Federation of the Blind (NFB) filed a class action lawsuit against Target Corporation on February 9, 2006, for making their websites inaccessible to people with visual impairments. The Americans with Disabilities Act (ADA), signed into law in 1990, includes the right of people with disabilities to “equal places of public accommodations.” The lawsuit filed by the NFB, which would go on to set a legal precedent for web accessibility, claimed that “public accommodations” extended to the internet and so Target should be held responsible. While Target had captions for their videos and other accessibility features, the company did not have alt text for their images, and so it was nearly impossible for shoppers who use screen readers to navigate the website. Target would go on to pay a $6 million settlement fee and, by 2016, become a Strategic Nonvisual Access Partner of the NFB.

The people who built the internet have always been conscious of accessibility. In a newsletter in 1996, Tim Bernes-Lee, one of the early contributors to the web, wrote

“The emergence of the World Wide Web has made it possible for individuals with appropriate computer and telecommunications equipment to interact as never before. It presents new challenges and new hopes to people with disabilities.” 

Later that year, the World Wide Web Consortium’s Web Accessibility Initiative was created to establish standards and resources to help make the web more accessible. At that time, that meant making it easier for people with disabilities to navigate the internet and with an emphasis on offering equivalent hearing and visual content alternatives. Enter alt texts.

Alt Text Makes the Web Visible to All

Alt texts (also called an “alt tag” or ”alt attribute”) are words or phrases that are included in HTML code to tell web users the contents of an image. Alt texts were originally used to describe images that were taking a long time to load on 1990s dialup, but today they are an essential accessibility feature for the internet. Alt texts help people with disabilities who use screen readers to use the web fully without missing out on the context of images. In the Target lawsuit, a visually impaired shopper described selecting an image of a Dyson vacuum and hearing “Link GP browse dot HTML reference zero six zero six one eight nine six three eight one eight zero seven two nine seven three five 12 million 957 thousand 121,” instead of a description of the product. 

As web accessibility gains currency (and lawsuits over web accessibility and the absence of proper alt attributes on websites increase), many companies are now making accessibility a priority and creating features to make the web easier for people with disabilities, including adding alt texts to improve photo descriptions for visually impaired people. Social media companies are also enlisting users to help generate image descriptions for their sites. Late last year, Twitter rolled out an image description reminder to encourage people to add descriptions to their images and it is now possible to add Instagram allows users to upload their image descriptions instead of relying on Instagram’s automatically generated alt texts.

Digital Grist for the AI Mill

A byproduct of the abundance of alt texts used to describe images on the web is that there’s also an abundance of labeled images that can be compiled into vast datasets and used to train AI models. Creating datasets, especially visual-language datasets, can be a very expensive process. Curating, filtering and post-processing is costly, and this restricts the size of the dataset, making it difficult to scale the trained model. 

With image alt-text pairs, this cost is reduced drastically because dataset generation can be automated, making the generation process scalable. As a result, a lot of large-scale datasets are curated with image alt text pairs harvested from the internet. For example, Google’s image dataset project, Conceptual Captions, scraped images from the web along with its corresponding alt text HTML attribute and then filtered them  to create a dataset to train models to auto-generate image captions in a fully automated system. Meta has also done a similar project with its object recognition technology, Automatic Alternative Text (AAT), to generate descriptions of pictures on Facebook. But perhaps one of the most impressive labeled image sets is LAION-5B, compiled by the Large-scale Artificial Intelligence Open Network (LAION). LAION-5B is a large dataset with more than 5 billion CLIP-filtered image-text pairs. About half of the captions are written in English, but the labels that are not written in English come from over 100 different languages. The dataset was created by parsing through WAT files from Common Crawl and parsing out all HTML IMG tags containing an alt text attribute.

While alt texts are not the only web accessibility features that make data harvesting easier—features like simple and consistent web page layouts, closed captions on videos, and using links with descriptive names are very useful in allowing people with disabilities to access websites, but they can also double as a way to make harvesting and parsing through data easier—they are one of the most valuable, especially for creating datasets for image-based generative models. 

There is something to be said about the way that accessibility features that are supposed to help people with disabilities navigate the web better also function as a data source for companies to mine. It could be a demonstration of how making the internet more accessible is beneficial to everyone including abled people and the companies building the web. Or maybe it just shows how every feature on the web, including those that are supposed to meet the needs of specific communities, have to justify their existence by being useful for more than just the reason they were created. Even reCaptcha, which was simply supposed to be a way to protect websites from bots, is not left out of this. When we identify images on reCaptcha to prove that we’re not robots, we are also helping Google train AI models.

Whatever the case may be, I think this is one of those cases where a little transparency would go a long way. While at the end of the day, alt texts and other accessibility features do benefit people who need them, it doesn’t take away from the fact that there are also a myriad of ways that they can be used that benefit other parties and help them eventually make money. Making sure that users are aware of all of those ways would come in handy in an otherwise sticky situation.

Newsletter

Get Deepgram news and product updates

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

More with these tags:

Share your feedback

Thank you! Can you tell us what you liked about it? (Optional)

Thank you. What could we have done better? (Optional)

We may also want to contact you with updates or questions related to your feedback and our product. If don't mind, you can optionally leave your email address along with your comments.

Thank you!

We appreciate your response.