A Chinese lab has made a significant breakthrough in artificial intelligence, releasing an open-source AI model, DeepSeek V3, that outperforms industry giants like Meta's Llama 3.1 and OpenAI's GPT-4. This achievement marks a significant milestone in the development of AI technology, with far-reaching implications for the industry.
DeepSeek V3, developed by the AI firm DeepSeek, is a text-based AI model that can handle a range of tasks, including coding, translating, and writing essays and emails from a descriptive prompt. According to internal benchmark testing, DeepSeek V3 surpasses both downloadable, openly available models and closed AI models that can only be accessed through an API.
In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek V3 outperforms models including Meta's Llama 3.1 405B, OpenAI's GPT-4o, and Alibaba's Qwen 2.5 72B. Additionally, it crushes the competition on Aider Polgyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates into existing code.
DeepSeek V3's impressive performance can be attributed to its massive size, boasting 671 billion parameters, around 1.6 times the size of Llama 3.1 405B. The model was trained on a staggering 14.8 trillion tokens, equivalent to approximately 11 billion words. While larger models tend to outperform smaller ones, they also require more powerful hardware to run, making them less practical for widespread adoption.
Despite its impressive capabilities, DeepSeek V3 raises concerns over political bias. As a Chinese company, DeepSeek is subject to benchmarking by China's internet regulator to ensure its models' responses "embody core socialist values." This means that the model may decline to respond to topics that might raise the ire of regulators, such as speculation about the Xi Jinping regime.
DeepSeek, backed by High-Flyer Capital Management, a Chinese quantitative hedge fund, has been making waves in the AI industry. The company recently unveiled DeepSeek-R1, an answer to OpenAI's o1 "reasoning" model, and has forced competitors like ByteDance, Baidu, and Alibaba to cut the usage prices for some of their models and make others completely free.
High-Flyer, founded by Liang Wenfeng, a computer science graduate, aims to achieve "superintelligent" AI through its DeepSeek organization. In an interview earlier this year, Liang described open sourcing as a "cultural act," and characterized closed-source AI like OpenAI's as a "temporary" moat. His vision for open-source AI models like DeepSeek V3 could potentially disrupt the industry and pave the way for more collaborative innovation.
As the AI landscape continues to evolve, the release of DeepSeek V3 marks a significant shift towards open-source AI models. While concerns over political bias and practicality remain, the achievement is undeniable, and its implications will be closely watched by industry experts and enthusiasts alike.