Telegram Unveils Major Update with AI-Powered Sticker Search, Video Enhancements, and More
Telegram releases significant update, introducing AI-driven sticker search, video features, and improvements to bot discovery and video consumption.
Taylor Brooks
Sesame, the AI company behind the impressively realistic voice assistant Maya, has made a significant move by open-sourcing the base AI model that powers Maya. The model, called CSM-1B, is a massive 1 billion parameters in size and is now available under the Apache 2.0 license, allowing for commercial use with few restrictions.
CSM-1B is capable of generating "RVQ audio codes" from text and audio inputs, using a technique called residual vector quantization (RVQ). This technique is also used in other recent AI audio technologies, such as Google's SoundStream and Meta's Encodec. The model uses a backbone from Meta's Llama family paired with an audio "decoder" component, and a fine-tuned variant of CSM powers Maya.
According to Sesame, the open-sourced model is a base generation model that can produce a variety of voices, but it has not been fine-tuned on any specific voice. The model also has some capacity for non-English languages due to data contamination in the training data, but it likely won't perform well in these languages. However, it's unclear what data Sesame used to train CSM-1B, as the company hasn't disclosed this information.
One notable aspect of the open-sourced model is the lack of real safeguards. Sesame is relying on an "honor system," urging developers and users not to use the model to mimic a person's voice without their consent, create misleading content like fake news, or engage in "harmful" or "malicious" activities. This raises concerns about the potential misuse of the technology, especially given its capabilities.
I tried the demo on Hugging Face, and cloning my voice took less than a minute. From there, it was easy to generate speech on various topics, including controversial ones like the election and Russian propaganda. This demonstrates the model's impressive capabilities, but also highlights the need for responsible use and development of this technology.
Sesame, co-founded by Oculus co-creator Brendan Iribe, gained widespread attention in late February for its assistant tech, which comes close to clearing the uncanny valley territory. Maya and Sesame's other assistant, Miles, take breaths and speak with disfluencies, and can be interrupted while speaking, much like OpenAI's Voice Mode. The company has raised an undisclosed amount of capital from investors like Andreessen Horowitz, Spark Capital, and Matrix Partners.
Beyond its voice assistant tech, Sesame is also prototyping AI glasses "designed to be worn all day" that'll be equipped with its custom models. This move marks a significant expansion of the company's ambitions in the AI space, and the open-sourcing of CSM-1B could have far-reaching implications for the development of AI audio technologies.
The release of CSM-1B under the Apache 2.0 license is a significant step towards democratizing access to AI audio technology. While it raises concerns about the potential misuse of the technology, it also opens up opportunities for developers and researchers to build upon and improve the model. As the AI landscape continues to evolve, it will be important to monitor the development and use of CSM-1B and other AI audio technologies.
Telegram releases significant update, introducing AI-driven sticker search, video features, and improvements to bot discovery and video consumption.
Tizeti, a Y Combinator-backed internet service provider, is set to list on the Nigerian Exchange, marking a significant milestone for African startups and a potential shift in the continent's startup ecosystem.
Meta's Threads rebalances algorithm to show more content from followed accounts, reducing recommended posts from unknown accounts.
Copyright © 2024 Starfolk. All rights reserved.