OpenAI launches an API for ChatGPT and its Whisper speech-to-text tech

OpenAI has announced that it’s now letting third-party developers integrate ChatGPT into their apps and services via a newly launched Whisper API. This tool will be significantly cheaper than using its existing language models.

The Whisper API is a hosted version of the open-source Whisper speech-to-text model, released by the company in September 2022. Whisper is an automatic speech recognition system, priced at $0.006 per minute and allows OpenAI claims to enable large-sized transcription in multiple languages. It takes various file formats, such as M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM.

Despite competitive tech organizations like Google, Amazon and Meta having developed high speech recognition systems, Whisper proves itself outstanding, trained on 680,000 hours of multilingual and “multitask” data collected from the web. According to Greg Brockman, the OpenAI president and chairman, this affords it upgraded recognition features like unique accents, background noise, and technical jargon.

“We released a model, but that actually was not enough to cause the whole developer ecosystem to build around it,” Brockman said in a video call with TechCrunch yesterday afternoon. “The Whisper API is the same large model that you can get open source, but we’ve optimized to the extreme. It’s much, much faster and extremely convenient.”

Microsoft and OpenAI partnership — OpenAI launches Whisper API

In regards to Brockman’s point, there are limitations in enterprises adopting voice transcription technology. This 2020 Statista survey backs this up by their citations of accuracy, accent- or dialect-related recognition issues and costs as the core barriers in adopting tech like tech-to-speech.

One of the limitations of Whisper is in “next-word” prediction. This is due to the enormous data trained with the system. However, OpenAI cautions that Whisper might include words that weren’t spoken in its transcriptions, possibly because it’s both trying to predict the next word in the audio and transcribe the audio recording itself.

Moreover, Whisper’s performance varies according to the language used, with speakers of less well-represented languages in training set experiencing a larger error rate.

Reflecting on the last statement, the best systems are faulted by biases, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM, and Microsoft made far fewer errors, about 19% with users who are white than with users who are Black.

How does OpenAI hope to maximize Whisper?

OpenAI anticipates using Whisper’s transcribing capabilities to enhance current software, services, tools, and solutions. The Whisper API is already being used by the AI-powered language learning app Speak to enable a brand-new in-app virtual speaking companion.

Also, OpenAI breaking into the speech-to-text market may be quite profitable in a significant way. According to report, a single estimate places the potential market value at $5.4 billion by 2026, up from $2.2 billion in 2021.

“Our picture is that we really want to be this universal intelligence,” Brockman said. “We really want to, very flexibly, be able to take in whatever kind of data you have, whatever kind of task you want to accomplish, and be a force multiplier on that attention.”

Get the best of Africa’s daily tech to your inbox – first thing every morning.
Join the community now!

Hand-Picked Top-Read Stories

IrokoTV Plans to List on LSE as it Shifts Focus to African Subscribers in Diaspora

South Africa Proposes 30% Local Content On Netflix to Boost TV Licence Revenue

Bosun Tijani says over 1m applicants for FG’s 3MTT program to undergo AI training

OpenAI launches an API for ChatGPT and its Whisper speech-to-text tech

How does OpenAI hope to maximize Whisper?

Technext Newsletter

Register for Technext Coinference 2023, the Largest blockchain and DeFi Gathering in Africa.

Technext Newsletter