Alibaba Unveils Qwen2.5-VL AI Models, Challenging OpenAI's Operator

Alexis Rowe

Alexis Rowe

January 27, 2025 · 3 min read
Alibaba Unveils Qwen2.5-VL AI Models, Challenging OpenAI's Operator

While Chinese AI lab DeepSeek has been making headlines, Alibaba's Qwen team has quietly released a new family of AI models, Qwen2.5-VL, which can perform a range of tasks, including text and image analysis, video understanding, and even control a PC. This move is seen as a direct challenge to OpenAI's recently launched Operator model.

The Qwen2.5-VL models have been benchmarked to outperform OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google's Gemini 2.0 Flash in various evaluations, including video understanding, math, document analysis, and question-answering. The models can parse files, understand videos, and count objects in images, making them a significant advancement in AI capabilities.

The Qwen2.5-VL family consists of three models: Qwen2.5-VL-3B, Qwen2.5-VL-7B, and the flagship Qwen2.5-VL-72B. The two smaller models are available under a permissive license, while the flagship model is under Alibaba's custom license, which requires companies and developers with more than 100 million monthly active users to request permission from Qwen/Alibaba before deploying the model commercially.

One of the most interesting features of Qwen2.5-VL is its ability to interact with software on PCs and mobile devices. A video demonstration shows the model launching the Booking.com app for Android and booking a flight from Chongqing to Beijing. Another video demonstrates the model controlling apps on a Linux desktop, although it doesn't seem to accomplish much beyond switching tabs.

However, it's worth noting that Qwen2.5-VL, being an AI developed by a Chinese company, has certain restrictions on the topics it will discuss. When asked to discuss "Xi Jinping's mistakes," the model threw an error message. This is likely due to China's internet regulator benchmarking many models developed in the country to ensure their responses "embody core socialist values."

Despite these limitations, the release of Qwen2.5-VL marks a significant milestone in the development of AI capabilities. As the tech industry continues to push the boundaries of what is possible with AI, it will be interesting to see how models like Qwen2.5-VL are used and developed in the future.

In the broader context, the release of Qwen2.5-VL highlights the growing competition in the AI space, with companies like Alibaba, OpenAI, and Google vying for dominance. As AI capabilities continue to advance, it's likely that we'll see even more innovative applications of this technology in the years to come.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.