It's been over a year since OpenAI announced a "small-scale preview" of its Voice Engine, a revolutionary AI service that can clone a person's voice with just 15 seconds of speech. However, the tool remains in limited preview, with no indication of when or if it will launch widely. This reluctance to roll out the service may point to fears of misuse, but it could also reflect an effort to avoid inviting regulatory scrutiny.
OpenAI has historically been accused of prioritizing "shiny products" at the expense of safety, and of rushing releases to beat rival firms to market. In a statement, an OpenAI spokesperson told TechCrunch that the company is continuing to test Voice Engine with a limited set of "trusted partners." The spokesperson emphasized that the company is "learning from how [our partners are] using the technology so we can improve the model's quality and safety."
Voice Engine, which powers the voices available in OpenAI's text-to-speech API as well as ChatGPT's Voice Mode, generates natural-sounding speech that closely resembles the original speaker. The tool converts written characters to speech, limited only by certain guardrails on content. However, it was subject to delays and shifting release windows from the start.
According to a draft blog post seen by TechCrunch, OpenAI had initially intended to bring Voice Engine, originally called Custom Voices, to its API on March 7, 2024. The plan was to give a group of up to 100 "trusted developers" access ahead of a wider debut, with priority given to devs building apps that provided a "social benefit" or showed "innovative and responsible" uses of the technology. OpenAI had even trademarked and priced it: $15 per million characters for "standard" voices and $30 per million characters for "HD quality" voices.
However, at the eleventh hour, the company postponed the announcement. OpenAI ended up unveiling Voice Engine a few weeks later without a sign-up option. Access to the tool would remain limited to a cohort of around 10 devs the company began working with in late 2023, OpenAI said. The company hopes to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities.
Voice Engine has been in the works since 2022, according to OpenAI. The company claims it demoed the tool to "global policymakers at the highest levels" in summer 2023 to showcase its potential — and risks. Several partners have access to Voice Engine today, including startup Livox, which is building devices that enable people with disabilities to communicate more naturally.
Livox CEO Carlos Pereira told TechCrunch that while the company ultimately couldn't build Voice Engine into a product due to the tool's online requirement (many of Livox's customers don't have internet), he found the technology to be "really impressive." Pereira hopes that OpenAI develops an offline version soon. He hasn't received guidance from OpenAI on a possible Voice Engine launch, nor has he seen any signs the company plans to begin charging for the service.
Informed by discussions with stakeholders, Voice Engine has several mitigatory safety measures, including watermarking to trace the provenance of generated audio. Developers must obtain "explicit consent" from the original speaker before using Voice Engine, according to OpenAI, and they must make "clear disclosures" to their audience that voices are AI-generated. However, the company hasn't said how it's enforcing these policies, which could prove to be immensely challenging, even for a company with OpenAI's resources.
Effective filtering and ID verification are fast becoming baseline requirements for responsible voice cloning tech releases. AI voice cloning was the third fastest-growing scam of 2024, according to one source. It's led to fraud and bank security checks being bypassed as privacy and copyright laws struggle to keep up. Malicious actors have used voice cloning to create incendiary deepfakes of celebrities and politicians, and those deepfakes have spread like wildfire across social media.
OpenAI could release Voice Engine next week — or never. The company has repeatedly said that it's weighing keeping the service small in scope. But one thing's clear: for optics reasons, safety reasons, or both, Voice Engine's limited preview has become one of the longest in OpenAI's history.
The prolonged delay raises questions about OpenAI's commitment to responsible innovation and its ability to balance the potential benefits of Voice Engine with the risks of misuse. As the company continues to test and refine the technology, it remains to be seen whether Voice Engine will eventually see the light of day or remain a promising but unrealized innovation.