Sesame Releases Open-Source AI Model Behind Realistic Voice Assistant Maya

Elliot Kim

Elliot Kim

March 13, 2025 · 4 min read
Sesame Releases Open-Source AI Model Behind Realistic Voice Assistant Maya

Sesame, the AI company behind the impressively realistic voice assistant Maya, has fulfilled its promise by releasing the base AI model that powers Maya. The model, called CSM-1B, is a significant development in the field of artificial intelligence, boasting an impressive 1 billion parameters in size.

CSM-1B is licensed under Apache 2.0, which means it can be used commercially with minimal restrictions. This open-source approach is likely to spark innovation and accelerate progress in AI audio technologies. According to Sesame's description on the AI dev platform Hugging Face, CSM-1B generates "RVQ audio codes" from text and audio inputs, utilizing a technique called residual vector quantization (RVQ) for encoding audio into discrete tokens called codes.

RVQ is a technique that has gained popularity in recent AI audio technologies, including Google's SoundStream and Meta's Encodec. CSM-1B's architecture is based on a model from Meta's Llama family, paired with an audio "decoder" component. A fine-tuned variant of CSM-1B powers Maya, Sesame's voice assistant that has garnered significant attention for its uncanny valley-like realism.

It's worth noting that the open-sourced model is a base generation model, capable of producing a variety of voices but not fine-tuned on any specific voice. Sesame also mentions that the model has some capacity for non-English languages due to data contamination in the training data, but its performance is likely to be limited.

The lack of transparency regarding the training data used for CSM-1B raises some questions. Sesame has not disclosed the specific data used to train the model, which could potentially impact its performance and biases.

Another important aspect to consider is the lack of safeguards in place to prevent misuse of the model. Sesame is relying on an "honor system," urging developers and users not to use the model to mimic a person's voice without their consent, create misleading content, or engage in harmful activities. This approach may not be sufficient to prevent potential abuses, and it remains to be seen how the AI community will respond to this open-source release.

In a demo on Hugging Face, it took less than a minute to clone a voice, and generating speech on various topics, including controversial ones, was surprisingly easy. This raises concerns about the potential misuse of the model, especially in the context of fake news and propaganda.

Sesame, co-founded by Oculus co-creator Brendan Iribe, has gained significant attention for its voice assistant technology, which comes close to clearing the uncanny valley. The company has raised an undisclosed amount of capital from prominent investors, including Andreessen Horowitz, Spark Capital, and Matrix Partners. In addition to building voice assistant tech, Sesame is also prototyping AI glasses designed to be worn all day, equipped with its custom models.

The release of CSM-1B is a significant development in the AI landscape, with potential implications for various industries, including customer service, entertainment, and education. As the AI community begins to explore and build upon this open-source model, it will be crucial to monitor its applications and ensure responsible use.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.