Unlocking a Wave of LLM Apps on Ryzen™ AI Through Lemonade Server

Apr 17, 2025

Lemonade Server  is a powerful tool that enables local large language models (LLMs) to run with neural processing unit (NPU) acceleration on AMD Ryzen™ AI 300 series PCs. It does this by supporting the OpenAI API standard, which provides developers and enthusiasts an easy way to integrate with a wide range of existing applications, such as chatbots and coding assistant tools. 

The best part of Lemonade Server for many people will be that it enables NPU-accelerated LLMs in popular applications without requiring any coding knowledge or application code changes whatsoever. Lemonade Server works on any Windows PC but offers its best performance on Ryzen AI 300-series PCs running Windows 11, where it uses the Neural Processing Unit (NPU) and integrated GPU (iGPU) together in a hybrid execution mode to optimize performance and power efficiency.

Lemonade Server launching process
Figure 1: Lemonade Server in action

What is Lemonade Server?

Lemonade Server is an integral component of the Lemonade SDK open-source project . While the Lemonade SDK provides a variety of tools for deploying, testing, and benchmarking LLMs, Lemonade Server is specifically built for quick integration with existing applications using the OpenAI standard to enable local LLM deployment. Using the one-click installer, all dependencies and software packages required to run and serve accelerated local LLMs on Ryzen AI PCs are installed for the user. After installation, the user can point their application to the Lemonade Server, which provides an OpenAI-compatible API, enabling the user to leverage the Ryzen AI NPU and iGPU for local LLM performance.

Accelerating the Developer Workflow

One of the key benefits of Lemonade Server is its out-of-the-box compatibility with a vast array of existing AI applications. Figure 2 shows some of the popular applications we’ve validated already, and you can visit the Lemonade Server applications page for step-by-step instructions for each.

Lemonade Server blog
Figure 2: Applications validated with Lemonade Server

Instead of modifying code or integrating language-specific APIs, developers only need to point their application to the Lemonade Server endpoint. This simple switch unlocks LLM capabilities on Ryzen-powered PCs without requiring deep knowledge of the underlying hardware or software stack  .

At the time of this writing, Lemonade Server is the only open-source OpenAI-compatible server to offer Ryzen AI NPU acceleration of LLMs.  

Getting Started with Lemonade Server 

If you’re looking to deploy LLMs locally, setting up Lemonade Server is straightforward :

  1. Download and Install – Navigate to the releases page and download the Lemonade_Server_Installer.exe GUI installer.
    • You may need to give your browser or Windows permission.
    • Run the installer and select the models you wish to use.
  2. Launch the Server – After installation, you can start the server directly from the desktop shortcut.
  3. Point Your Favorite App to the Server – This will depend on the app  that you want to integrate. You can find a few examples of how to do that here:
  4. Run Inference Locally – With your app connected, you can now send prompts to your locally hosted LLM, leveraging Ryzen AI’s compute power on CPU, or iGPU and NPU combined (Hybrid). 

A video tutorial for the Open WebUI integration is available here.

Native Integration

In addition to the easy setup and out-of-the-box compatibility, we’re working with a number of popular applications to ensure that Lemonade Server integrates natively. This means that, in the near future, certain applications will automatically be able to use Lemonade Server without requiring any manual configuration. This will further streamline the user experience and open the door for even more seamless integrations of Ryzen AI powered LLMs into everyday apps.

More information is available for native integration in the server specification  and server integration documents.

The Broader Impact

Lemonade Server isn't just about running models locally: it’s about democratizing LLM development. By open-sourcing and making it effortless to integrate LLM applications with Ryzen AI, AMD is enabling a broader community of developers and enthusiasts to experiment, innovate, and deploy AI-powered solutions without cloud dependency or complex setups. 

For more detailed instructions and to explore additional applications that work with Lemonade Server, check out the official videos, documentation, and examples. Whether you’re an AI researcher, an application developer, or an enthusiast looking to harness local AI processing, Lemonade Server makes it easier than ever to bring powerful LLMs to your own device.

Stay tuned for more updates on AI innovations powered by Ryzen AI. If you’ve experimented with Lemonade Server in your applications, we’d love to hear about it. We are also welcoming contributions to our GitHub Repository, so if you’re eager to get involved, now is a great time! 

For feedback or questions, please reach out to us at turnkeyml@amd.com.

Click here to watch the Ryzen AI Developer Video Series.

Share:

Article By


AI Developer Enablement Manager