The rising costs of developing and running AI models have become a significant concern, with OpenAI's AI operations costs expected to reach $7 billion this year and Anthropic's CEO predicting models costing over $10 billion in the near future. To address this issue, researchers and startups are exploring innovative techniques to optimize existing model architectures and develop new ones that can scale up affordably. One such startup is Cartesia, which is pioneering the development of state space models (SSMs), a highly efficient model architecture that can handle large amounts of data at once.
Karan Goel, co-founder of Cartesia, believes that new model architectures are necessary to build truly useful AI models, particularly in a competitive industry where building the best model is crucial to success. Goel's background in Stanford's AI lab, where he worked under the supervision of computer scientist Christopher Ré, laid the foundation for his work on SSMs. Alongside fellow researcher Albert Gu, Goel developed the concept of SSMs, which was later refined through several research papers.
In 2023, Goel, Gu, and two other former Stanford peers, Arjun Desai and Brandon Yang, founded Cartesia to commercialize their research. The startup's founding team also includes Ré, and they have developed many derivatives of Mamba, a popular SSM. Cartesia builds on top of Mamba and trains its own SSMs, which give AI models a working memory, making them faster and potentially more efficient in how they draw on computing power.
SSMs differ significantly from traditional transformer architectures, which are used in most AI apps today. While transformers process data by adding entries to a "hidden state" to "remember" what they processed, SSMs compress every prior data point into a summary of everything they've seen before. This approach enables SSMs to handle large amounts of data while outperforming transformers on certain data generation tasks.
Cartesia's latest project, Sonic, is an SSM that can clone a person's voice or generate a new voice and adjust the tone and cadence in the recording. Goel claims that Sonic is the fastest model in its class, demonstrating how SSMs excel on long-context data like audio while maintaining high performance bars for stability and accuracy. However, Cartesia has faced ethical concerns, including training models on The Pile, an open data set known to contain unlicensed copyrighted books, and lacking apparent safeguards for its Sonic-powered voice cloner.
Goel acknowledges the need for better moderation and has implemented automated and manual review systems, as well as partnerships with external auditors to provide additional independent verification of their models' safety and reliability. Despite these challenges, Cartesia has managed to attract hundreds of customers, including automated calling app Goodcall, which uses Sonic API access for its AI "agent" service.
Cartesia's business model relies on customer data to train its models, although users can opt out if they wish. The startup offers custom retention policies for larger organizations and has a technical advantage that has helped it secure a $22 million funding round led by Index Ventures, bringing its total raised to $27 million. Shardul Shah, partner at Index Ventures, believes that Cartesia's technology has the potential to drive apps for customer service, sales and marketing, robotics, security, and more.
As Cartesia continues to develop its SSMs, it faces competition from other startups like Zephyra, Mistral, and AI21 Labs, which are also experimenting with alternative architectures. However, Goel is confident that Cartesia's unique approach will position it for success in the long run. The startup's vision is to create models that can run on any device and understand and generate any modality of data almost instantly, with applications in gaming, voice dubbing, and more.
In a significant step towards this goal, Cartesia recently launched a beta of Sonic On-Device, a version of Sonic optimized to run on phones and other mobile devices for applications like real-time translation. The startup has also published Edge, a software library to optimize SSMs for different hardware configurations, and Rene, a compact language model. As the AI landscape continues to evolve, Cartesia's innovative approach to SSMs is poised to play a significant role in shaping the future of AI development.