How AI Will Revolutionize Closed Captioning Generation

In short:

AI is transforming closed captioning by automating transcription with increasing accuracy, lowering time and cost barriers to captions for video content. That brings captions closer to real-time availability and broader reach. Human review is still necessary to ensure accuracy and alignment with WCAG requirements.

Artificial intelligence has lived rent-free in the minds of Wall Street traders, workers, and even parents with school-aged children over the past year as new tools emerge almost daily.

Many AI developments have sparked resistance and anxiety, dredging up concerns that they could replace well-paying jobs or erode the quality of education. However, some advancements promise to promote equity and accessibility through new technologies, including AI-powered voice-to-text transcription.

accessiBe analyzed data from captioning service 3Play Media to show how error rates in automated closed captioning are declining. This is a promising development for a future in which closed captions might be more efficiently produced and more widely available. The report draws from more than 100 hours of transcription content representing an array of speaking accents and locales and draws on transcriptions in higher education, tech, consumer goods, cinema, sports, and other industries.

For many, closed captions are a helpful tool—and one that an increasing number of Gen Z and millennials prefer when watching videos, according to a 2023 YouGov survey in which respondents say captions enhance their concentration or help them understand thick accents.

Transcribed captions also make videos accessible to the estimated 15.5% of U.S. adults with hearing impairments, per 2022 National Health Interview Survey data. It is worth noting that Congress requires video programming distributors to include them on TV programs and transcribing is manual work. For decades, creating captions for live TV and other video content has been conducted by more than 20,000 workers who make up the closed captioning and court reporters services industry.

AI has fueled the rapid advancement of audio transcription tools in consumer and corporate software. Apple reportedly plans to introduce AI transcription to its Voice Memos and Notes apps in the next update to its iPhone operating system, iOS 18. The update would potentially bring instantaneous transcription to phone apps used by more than 2 billion people. Other companies like Zoom have also added AI-powered features, including AI transcription of video calls.

Tools making the most significant improvements in word errors

Several of the most prominent providers of the AI engines behind these services have seen their AI become even more accurate in the last year, while others have suffered from decreased accuracy, according to a recent study by 3Play Media.

Research results on tracking word error rates in AI transcriptions tools. Detailed description to follow.

As appears in the chart, Google and IBM's audio transcription AI performed worse in 2023 than in 2022. Google Standard had a 28% error rate, while Google Video had a 14% error rate—an increase of 2 percentage points and 0.7 percentage points since 2022, respectively. Meanwhile, IBM Watson had a 25% error rate, a 1.5 percentage point error increase since 2022.

Conversely, other platforms saw slight improvements with lower error rates over the same period. These include Rev AI (-3.4 percentage points), Microsoft (-0.91 percentage points), and Speechmatics (-1.11 percentage points). They all had a 10% or less error rate for 2023.

Other transcription tools tracked in the study include Assembly AI (8% error rate), the multilingual OpenAI Whisper: Large (8% error rate), and the English-only OpenAI Whisper: Tiny (15% error rate), all of which didn’t have 2022 data to make a year-over-year comparison.

Word error rates describe the number of times a transcription engine might interpret the wrong word in an uploaded audio file. Generally, the models analyzed during the research made some progress from 2022 to 2023 in reducing the number of word errors they make, but not in all cases.

Another factor that affects AI's reliability for transcription is its frequency of punctuation errors, which affect readability. In this realm, OpenAI's model is most accurate, but it is still only 85% reliable.

Open AI offers multiple speech recognition models of varying complexity and power. The "tiny" version only performs English language transcription, whereas its large model is multilingual. The company launched these for the first time in 2022, and 3Play Media only studied them for the 2023 year. AssemblyAI's speech recognition tool was also more recently released and has no comparable prior-year data. Developers trained its 2023 speech recognition software on 1 million hours of audio—it is also capable of English language transcription.

In a testament to how quickly advancements in the space are moving, Assembly released a successor just this year based on 12 times the training data, which it advertises as multilingual and "hallucinates" 30% less often than OpenAI's competing service.

AI hallucination refers to large language models' tendency to invent misstatements in their output. In a chatbot, this might look like a confidently stated fact that isn't true. These remaining shortcomings and the current state of AI development make using it for accessibility reasons difficult.

That's a major drawback for companies that take accessibility and the surrounding laws and requirements seriously, giving them pause before entirely handing the reins to AI for transcribing audio.

Pure AI transcription still isn't fully compliant with the ADA

Despite their advancements, AI transcription tools still aren't on par with human accuracy. People in the production loop often must make substantial edits to comply with legal guidelines for web content accessibility under the Americans with Disabilities Act.

It’s generally accepted that to achieve ADA compliance, websites should follow the Web Content Accessibility Guidelines (WCAG) that include specific instructions regarding manually reviewing auto-generated captions. AI-generated captions fall under this category, and, for the meantime, must be similarly reviewed for accuracy.

Even as AI companies advance toward more accurate models, the dream of instantaneous on-demand captions for people with disabilities may still be a ways away.

How AI is set to
revolutionize captioning

In short:

Summarize full blog with:

Table of Contents

Tools making the most significant improvements in word errors

Pure AI transcription still isn't fully compliant with the ADA

Table of Contents

Make your website accessible

ADA Title II and web accessibility: What public entities need to know

Introducing auto-resolve:
Accelerate your path to compliance in accessFlow

The Americans with Disabilities Act at 35: Why it matters more than ever

Ready to close your accessibility gaps?

How AI is set to revolutionize captioning

In short:

Summarize full blog with:

Table of Contents

Tools making the most significant improvements in word errors

Pure AI transcription still isn't fully compliant with the ADA

Table of Contents

Make your website accessible

More interesting content for you

ADA Title II and web accessibility: What public entities need to know

Introducing auto-resolve:Accelerate your path to compliance in accessFlow

The Americans with Disabilities Act at 35: Why it matters more than ever

Ready to close your accessibility gaps?

How AI is set to
revolutionize captioning

Introducing auto-resolve:
Accelerate your path to compliance in accessFlow