Chinese AI lab DeepSeek has taken the tech world by storm, with its chatbot app surging to the top of the Apple App Store charts and sparking concerns over the US's lead in the AI race and the demand for AI chips. The company's AI models, trained using compute-efficient techniques, have impressed Wall Street analysts and technologists alike, leading many to question whether the US can maintain its dominance in the field.
But where did DeepSeek come from, and how did it rise to international fame so quickly? The company has its roots in High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions. Founded by AI enthusiast Liang Wenfeng in 2015, High-Flyer launched DeepSeek as a lab dedicated to researching AI tools in 2023, with the goal of developing and deploying AI algorithms.
DeepSeek's technical team is notable for its youth and aggressive recruitment of doctorate AI researchers from top Chinese universities. The company also hires individuals without computer science backgrounds to help its tech better understand a wide range of subjects. Despite being affected by US export bans on hardware, DeepSeek has managed to build its own datacenter clusters for model training, albeit using less-powerful Nvidia H800 chips.
The company's AI models have been the key to its success. DeepSeek unveiled its first set of models, including DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat, in November 2023. However, it was the release of its next-gen DeepSeek-V2 family of models last spring that really caught the attention of the AI industry. The DeepSeek-V2 system, a general-purpose text- and image-analyzing system, performed well in various AI benchmarks and was significantly cheaper to run than comparable models at the time.
The release of DeepSeek-V3 in December 2024 only added to the company's notoriety. According to internal benchmark testing, DeepSeek V3 outperforms both downloadable, openly available models like Meta's Llama and "closed" models that can only be accessed through an API, like OpenAI's GPT-4o. The company's R1 "reasoning" model, released in January, has also impressed, performing as well as OpenAI's o1 model on key benchmarks.
The R1 model is notable for its ability to effectively fact-check itself, avoiding pitfalls that normally trip up models. While it takes a little longer to arrive at solutions compared to typical non-reasoning models, the upside is that it tends to be more reliable in domains such as physics, science, and math. However, as a Chinese-developed AI, the model is subject to benchmarking by China's internet regulator to ensure that its responses "embody core socialist values."
DeepSeek's business model is unclear, but the company's pricing strategy has been described as "disruptive." The company prices its products and services well below market value, giving some away for free. While some experts have disputed the figures supplied by DeepSeek, the company claims that efficiency breakthroughs have enabled it to maintain extreme cost competitiveness.
Developers have flocked to DeepSeek's models, which are available under permissive licenses that allow for commercial use. According to Clem Delangue, the CEO of Hugging Face, one of the platforms hosting DeepSeek's models, developers on Hugging Face have created over 500 "derivative" models of R1 that have racked up 2.5 million downloads combined. The company's success has been described as "upending AI" and ushering in "a new era of AI brinkmanship."
The implications of DeepSeek's success are far-reaching. The company's rise has already caused Nvidia's stock price to drop by 18% and elicited a public response from OpenAI CEO Sam Altman. As the US government grows increasingly wary of what it perceives as harmful foreign influence, the future of DeepSeek remains uncertain. One thing is clear, however: the company's breakthroughs have sent shockwaves through the tech industry, and its impact will be felt for years to come.