Introducing AMD Support for New Gemma 3 Models from Google
Mar 12, 2025
The rapid advancement of open-source language models is a key driver behind the growing significance of AI. AMD is proud to support Google’s newly announced Gemma 3 models — a family of lightweight, open language models built with Gemini-level technology. The new Gemma 3 series include models ranging in size from 1B to 27B parameters, suitable for both on-device inference and data center deployment. This versatile model family is optimized for a range of AMD devices, including NPU-equipped AMD Ryzen AI™ processors, AMD Radeon™ GPUs, and AMD Instinct ™ GPUs.
GPU Support
AMD hardware provides support for the new capabilities exposed by Gemma 3 such as 5-to-1 interleaved attention, long contexts of up to 128K tokens, and a large 256K token vocabulary. Additionally, multimodal inputs are supported using an intelligent pan-and-scan algorithm.
The latest release of the Hugging Face transformers package, which already includes support for AMD GPUs, adds support for the Gemma 3 family which allows Gemma 3 to be easily incorporated into existing inference and fine-tuning workflows. The vLLM project has also added support for Gemma 3, and AMD ROCm™ software builds of vLLM from the upstream repository allowing users to deploy Gemma 3 for inference with the significant set of optimizations that vLLM provides.
Getting Started with Gemma 3 for Inference on AMD GPUs
AMD will be including support for the Gemma 3 family in upcoming releases for an inference container image. This container-based deployment flow provides the simplest and most optimized path for running vLLM on AMD GPUs. Users with a desire to work with Gemma 3 immediately, can do so by building a vLLM container from the upstream vLLM repo, as described in the vLLM documentation or the ROCm documentation.
The steps involved will be familiar to users who have used vLLM with other models:
- Build a vLLM container as noted above
- Download the desired Gemma 3 model from Hugging Face
- Use one of the supported vLLM mechanisms for serving the model, such as the OpenAI-compatible server
Note that support for Gemma 3 in vLLM with AMD GPUs is initially limited to text inputs.
NPU Support
The smaller Gemma 3 models (1B, 4B and 12B) have been successfully deployed on the AMD Ryzen 300 Series processors using Day-0 deployment flow. For vision models (4B and 12B), bidirectional attention runs on the CPU while compute heavy operations are automatically offloaded to the NPU for performance and efficiency. AMD will continue to work to enable Gemma 3 models with the hybrid NPU + iGPU flow and is excited for what is to come.
Summary
AMD is committed to Day 0 support for important new AI models on AMD hardware. With launch day support for Gemma 3, we are excited to see how the community will make use of these innovative new models with AMD APUs and GPUs. Get started today by following the steps outlined here: LLM inference frameworks — ROCm Documentation