GAIA: An Open-Source Project from AMD for Running Local LLMs on Ryzen™ AI
Mar 20, 2025
AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we’ll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA’s open-source project to adopt into your own applications.
Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing – i.e. lower power– while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.
 
		Getting Started with GAIA
To get started with GAIA in under 10 minutes. Follow the instructions to download and install GAIA on your Ryzen AI PC. Once installed, you can launch GAIA and begin exploring its various agents and capabilities. There are 2 versions of GAIA:
- GAIA Installer – this will run on any Windows PC; however, performance may be slower.
- GAIA Hybrid Installer – this package is optimized to run on Ryzen AI PCs and uses the NPU and iGPU for better performance.
The Agent RAG Pipeline
One of the standout features of GAIA is its agent Retrieval-Augmented Generation (RAG) pipeline. This pipeline combines an LLM with a knowledge base, enabling the agent to retrieve relevant information, reason, plan, and use external tools within an interactive chat environment. This results in more accurate and contextually aware responses.
The current GAIA agents enable the following capabilities:
- Simple Prompt Completion: No agent for direct model interaction for testing and evaluation.
- Chaty: an LLM chatbot with history that engages in conversation with the user.
- Clip: an Agentic RAG for YouTube search and Q&A agent.
- Joker: a simple joke generator using RAG to bring humor to the user.
Additional agents are currently in development, and developers are encouraged to create and contribute their own agent to GAIA.
How does GAIA Work?
 
		The left side of Figure 2: GAIA Overview Diagram illustrates the functionality of Lemonade SDK from TurnkeyML. Lemonade SDK provides tools for LLM-specific tasks such as prompting, accuracy measurement, and serving across multiple runtimes (e.g., Hugging Face, ONNX Runtime GenAI API) and hardware (CPU, iGPU, and NPU).
Lemonade exposes an LLM web service that communicates with the GAIA application (on the right) via an OpenAI compatible REST API. GAIA consists of three key components:
- LLM Connector – Bridges the NPU service's Web API with the LlamaIndex-based RAG pipeline.
- LlamaIndex RAG Pipeline – Includes a query engine and vector memory, which processes and stores relevant external information.
- Agent Web Server – Connects to the GAIA UI via WebSocket, enabling user interaction.
On the right side of the figure, GAIA acts as an AI-powered agent that retrieves and processes data. It vectorizes external content (e.g., GitHub, YouTube, text files) and stores it in a local vector index. When a user submits a query, the following process occurs:
- The query is sent to GAIA, where it is transformed into an embedding vector.
- The vectorized query is used to retrieve relevant context from the indexed data.
- The retrieved context is passed to the web service, where it is embedded into the LLM’s prompt.
- The LLM generates a response, which is streamed back through the GAIA web service and displayed in the UI.
This process ensures that user queries are enhanced with relevant context before being processed by the LLM, improving response accuracy and relevance. The final answer is delivered to the user in real-time through the UI.
Benefits of Running LLMs Locally
Running LLMs locally on the NPU offers several benefits:
- Enhanced privacy, as no data needs to leave your machine. This eliminates the need to send sensitive information to the cloud, greatly enhancing data privacy and security while still delivering high-performance AI capabilities.
- Reduced latency, since there's no need to communicate with the cloud.
- Optimized performance with the NPU, leading to faster response times and lower power consumption.
Comparing NPU and iGPU
Running GAIA on the NPU results in improved performance for AI-specific tasks, as it is designed for inference workloads. Beginning with Ryzen AI Software Release 1.3, there is hybrid support for deploying quantized LLMs that utilize both the NPU and the iGPU. By using both components, each can be applied to the tasks and operations they are optimized for.
Applications and Industries
This setup could benefit industries that require high performance and privacy, such as healthcare, finance, and enterprise applications where data privacy is critical. It can also be applied in fields like content creation and customer service automation, where generative AI models are becoming essential. Lastly, it helps industries without Wi-Fi to send data to the cloud and receive responses, as all the processing is done locally.
Conclusion
In conclusion, GAIA, an open-source AMD application, uses the power of the Ryzen AI NPU to deliver efficient, private, and high-performance LLMs. By running LLMs locally, GAIA ensures enhanced privacy, reduced latency, and optimized performance, making it ideal for industries that prioritize data security and rapid response times.
Ready to try GAIA yourself? This video provides a brief overview and installation demo of GAIA.
Check out and contribute to the GAIA repo at github.com/amd/gaia. For feedback or questions, please reach out to us at GAIA@amd.com.
