Google DeepMind has announced the release of Gemma 3, a significant update to its family of generative AI models, which now boasts multi-modality capabilities, enabling it to analyze images, answer questions about images, identify objects, and perform other tasks that involve analyzing and understanding visual data.
The new model, announced on March 12, can be tried out in Google AI Studio for AI development. Gemma 3 has been designed to significantly improve math, coding, and instruction following capabilities, according to Google DeepMind. This is achieved through its ability to support vision-language inputs and text outputs, handle context windows up to 128k tokens, and understand more than 140 languages.
One of the key features of Gemma 3 is its enhanced math, reasoning, and chat capabilities, including structured outputs and function calling. The model comes in four "developer-friendly" sizes of 1B, 4B, 12B, and 27B, and is available in pre-trained and general-purpose instruction-tuned versions. The 128k-token context window allows Gemma 3 to process and understand massive amounts of information, easily tackling complex tasks.
Developers have multiple deployment options for Gemma 3, including Cloud Run and Google GenAI API. The model features a revamped code base, with recipes for inference and fine-tuning. Additionally, Gemma 3 model weights can be downloaded from Kaggle and Hugging Face. Nvidia has direct support for Gemma 3 models, ensuring maximum performance on GPUs of any size, from Jetson Nano to the most recent Blackwell chips. The model is also optimized for Google Cloud TPUs and integrates with AMD GPUs.
For executing on GPUs, users can leverage Gemma.cpp. This level of flexibility and compatibility is expected to make Gemma 3 a popular choice among developers and researchers working on AI projects.
In addition to Gemma 3, Google DeepMind also announced ShieldGemma 2, a 4B parameter model built on Gemma 3 that checks the safety of synthetic and natural images against key categories. ShieldGemma 2 is designed to help build robust data sets and models, and can be used as an input filter to vision language models or as an output filter of image generation systems.
ShieldGemma 2 allows developers to minimize the risk of harmful content, such as sexually explicit, dangerous, or violent content, making it an essential tool for ensuring the responsible development and deployment of AI models.
The release of Gemma 3 and ShieldGemma 2 marks a significant milestone in the development of AI models, and is expected to have far-reaching implications for industries such as healthcare, finance, and education. As AI continues to play an increasingly important role in our lives, the ability to analyze and understand visual data will become increasingly critical, and Gemma 3 is well-positioned to be at the forefront of this trend.