ElevenLabs Unveils Scribe, a Breakthrough Speech-to-Text Model Supporting 99 Languages

Max Carter

Max Carter

February 26, 2025 · 3 min read
ElevenLabs Unveils Scribe, a Breakthrough Speech-to-Text Model Supporting 99 Languages

Ai startup ElevenLabs, valued at $3.3 billion, has made a significant foray into the speech-to-text market with the launch of Scribe, its first standalone speech-to-text model. This move marks a strategic expansion for the company, which has primarily been known for its audio generation capabilities.

Scribe supports an impressive 99 languages at launch, with over 25 languages categorized as "excellent accuracy" with a word error rate of less than 5%. This includes languages such as English, French, German, Hindi, and Spanish, among others. The model's performance has been benchmarked against industry leaders, outperforming Google Gemini 2.0 Flash and Whisper Large V3 in FLEURS & Common Voice tests.

ElevenLabs' CEO, Mati Staniszewski, has emphasized the company's goal of improving speech detection models, moving beyond generating content to understanding and transcribing speech. Staniszewski believes that ElevenLabs' in-house teams and data annotation capabilities give the company an edge in building better speech detection models, particularly for languages where speech-to-text capabilities are currently limited.

Scribe boasts several features that set it apart from competitors, including smart speaker diarization, timestamping at the word level for accurate subtitles, and auto-tagging of sound events like audience laughter. The model also allows customers to directly transcribe video content and add subtitles or captions in its studio. However, it's worth noting that Scribe currently only works with pre-recorded audio formats, with a low-latency real-time version of the model slated for release in the near future.

In terms of pricing, ElevenLabs is offering Scribe at $0.40 per hour of transcribed audio, a competitive rate in the market. While some rivals may offer lower prices for audio transcriptions, ElevenLabs' feature set and language support may justify the cost for customers seeking high-quality speech-to-text capabilities.

The launch of Scribe marks a significant milestone for ElevenLabs, as the company looks to expand its offerings beyond audio generation and into the speech-to-text market. With its impressive language support and feature set, Scribe is poised to compete with industry leaders like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper models. As the company continues to develop and refine its speech detection models, it will be interesting to see how ElevenLabs' Scribe shapes the future of speech-to-text technology.

For more information on ElevenLabs and its Scribe model, visit their official website.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.