Cohere Unveils Aya Vision, a Multimodal 'Open' AI Model for Global Accessibility

Riley King

Riley King

March 04, 2025 · 3 min read
Cohere Unveils Aya Vision, a Multimodal 'Open' AI Model for Global Accessibility

Cohere, a prominent AI startup, has made a significant breakthrough in the field of artificial intelligence with the release of Aya Vision, a multimodal "open" AI model that can perform a range of tasks including writing image captions, answering questions about photos, translating text, and generating summaries in 23 major languages.

Aya Vision is designed to bridge the gap in language performance, which becomes more pronounced in multimodal tasks that involve both text and images. The model comes in two flavors, Aya Vision 32B and Aya Vision 8B, with the more sophisticated Aya Vision 32B outperforming models 2x its size, including Meta's Llama-3.2 90B Vision, on certain visual understanding benchmarks.

Both models are available for free through WhatsApp and can be accessed from AI dev platform Hugging Face under a Creative Commons 4.0 license with Cohere's acceptable use addendum. However, they cannot be used for commercial applications. This move is seen as a significant step towards making technical breakthroughs accessible to researchers worldwide, particularly those with limited access to compute resources.

The training of Aya Vision is notable for its use of synthetic annotations, which are annotations generated by AI. This approach is gaining popularity, with rivals like OpenAI also leveraging synthetic data to train models. According to research firm Gartner, 60% of the data used for AI and analytics projects last year was synthetically created. Cohere's use of synthetic annotations enabled the lab to use fewer resources while achieving competitive performance.

In addition to Aya Vision, Cohere has also released a new benchmark suite, AyaVisionBench, designed to probe a model's skills in "vision-language" tasks like identifying differences between two images and converting screenshots to code. This move is seen as a step towards rectifying the "evaluation crisis" in the AI industry, where popular benchmarks often give aggregate scores that correlate poorly to proficiency on tasks most AI users care about.

AyaVisionBench provides a "broad and challenging" framework for assessing a model's cross-lingual and multimodal understanding. By making this evaluation set available to the research community, Cohere aims to push forward multilingual multimodal evaluations and drive innovation in the field of AI.

The release of Aya Vision and AyaVisionBench is a significant development in the field of AI, with far-reaching implications for researchers, developers, and users worldwide. As the AI industry continues to evolve, it will be interesting to see how these tools are utilized and the impact they have on the development of more advanced and accessible AI models.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.