Introducing AMD Support for New Gemma 3 Models from Google

Mar 12, 2025

The rapid advancement of open-source language models is a key driver behind the growing significance of AI. AMD is proud to support Google’s newly announced Gemma 3 models — a family of lightweight, open language models built with Gemini-level technology. The new Gemma 3 series include models ranging in size from 1B to 27B parameters, suitable for both on-device inference and data center deployment. This versatile model family is optimized for a range of AMD devices, including NPU-equipped AMD Ryzen AI™ processors, AMD Radeon™ GPUs, and AMD Instinct ™ GPUs.

GPU Support

AMD hardware provides support for the new capabilities exposed by Gemma 3 such as 5-to-1 interleaved attention, long contexts of up to 128K tokens, and a large 256K token vocabulary. Additionally, multimodal inputs are supported using an intelligent pan-and-scan algorithm.

The latest release of the Hugging Face transformers package, which already includes support for AMD GPUs, adds support for the Gemma 3 family which allows Gemma 3 to be easily incorporated into existing inference and fine-tuning workflows. The vLLM project has also added support for Gemma 3, and AMD ROCm™ software builds of vLLM from the upstream repository allowing users to deploy Gemma 3 for inference with the significant set of optimizations that vLLM provides.

Getting Started with Gemma 3 for Inference on AMD GPUs

AMD will be including support for the Gemma 3 family in upcoming releases for an inference container image. This container-based deployment flow provides the simplest and most optimized path for running vLLM on AMD GPUs. Users with a desire to work with Gemma 3 immediately, can do so by building a vLLM container from the upstream vLLM repo, as described in the vLLM documentation or the ROCm documentation.

The steps involved will be familiar to users who have used vLLM with other models:

Build a vLLM container as noted above
Download the desired Gemma 3 model from Hugging Face
Use one of the supported vLLM mechanisms for serving the model, such as the OpenAI-compatible server

Note that support for Gemma 3 in vLLM with AMD GPUs is initially limited to text inputs.

NPU Support

The smaller Gemma 3 models (1B, 4B and 12B) have been successfully deployed on the AMD Ryzen 300 Series processors using Day-0 deployment flow. For vision models (4B and 12B), bidirectional attention runs on the CPU while compute heavy operations are automatically offloaded to the NPU for performance and efficiency. AMD will continue to work to enable Gemma 3 models with the hybrid NPU + iGPU flow and is excited for what is to come.

Summary

AMD is committed to Day 0 support for important new AI models on AMD hardware. With launch day support for Gemma 3, we are excited to see how the community will make use of these innovative new models with AMD APUs and GPUs. Get started today by following the steps outlined here: LLM inference frameworks — ROCm Documentation

Article By

AMD AI Group

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Industries

Workloads

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Adaptive SoCs, FPGAs, & SOMs

Graphics

Overview

Resources by Market Segment

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Introducing AMD Support for New Gemma 3 Models from Google

GPU Support

Getting Started with Gemma 3 for Inference on AMD GPUs

NPU Support

Summary

Article By

Company

News & Events

Community

Partners

Investors