Google has taken a significant leap in the AI reasoning model space with the unveiling of Gemini 2.5, a family of AI models that pause to "think" before answering a question. The tech giant has launched Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that it claims is its most intelligent model yet.
This new model will be available to developers on Google's AI Studio platform and to subscribers of the company's $20-a-month AI plan, Gemini Advanced. What's more, Google has announced that all its future AI models will have reasoning capabilities baked in, marking a significant shift in the company's AI development strategy.
The launch of Gemini 2.5 Pro comes at a time when the tech industry is racing to develop AI reasoning models that can match or exceed the capabilities of OpenAI's o1 model, launched in September 2024. Today, companies like Anthropic, DeepSeek, Google, and xAI all have AI reasoning models that use extra computing power and time to fact-check and reason through problems before delivering an answer.
These reasoning techniques have already shown impressive results in math and coding tasks, and many experts believe that they will be a key component of AI agents, autonomous systems that can perform tasks largely without human intervention. However, these models are also more expensive to develop and maintain.
According to Google, Gemini 2.5 Pro outperforms its previous frontier AI models, as well as some of the competing leading AI models, on several benchmarks. Specifically, the company has designed Gemini 2.5 to excel at creating visually compelling web apps and agentic coding applications.
In an evaluation measuring code editing, called Aider Polyglot, Google claims that Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek. On another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI's o3-mini and DeepSeek's R1, but underperforming Anthropic's Claude 3.7 Sonnet, which scored 70.3%.
On Humanity's Last Exam — a multimodal test including thousands of crowdsourced questions around math, humanities, and the natural sciences — Google says Gemini 2.5 Pro scores 18.8%, outperforming leading AI models from OpenAI, Anthropic, and DeepSeek.
One of the notable features of Gemini 2.5 Pro is its ability to intake large amounts of data. Google says the model is shipping with a 1 million token context window, which means it can process roughly 750,000 words in a single prompt. That's longer than the entire Lord of The Rings book series. The company has also announced that a 2 million token context window is coming soon.
While Google has experimented with AI reasoning models before, Gemini 2.5 represents the company's most serious competitor to OpenAI's o series of models yet. The implications of this development are significant, as AI reasoning models have the potential to revolutionize industries such as software development, scientific research, and education.
However, the pricing of the Gemini 2.5 Pro API remains unclear, and Google has not shared any details on this front. As the AI landscape continues to evolve, it will be interesting to see how Gemini 2.5 Pro is adopted by developers and the impact it has on the industry.