AI: From Endpoints, to Edge, to Cloud, Thanks to AMD
Generative AI is transforming the way enterprise customers operate. In fact, AI is quickly becoming a part of nearly every business process that it supports, from customer service to data analytics, and that deepening integration is only going to grow. However, AI is a relatively new workload, added to existing infrastructure and putting strain on current hardware configurations.
If customers want to enjoy seamless AI experiences and productivity gains immediately and over the long-term, they need help evolving their IT infrastructure. That’s where AMD technologies come in, offering enterprises the performance and efficiency to operate existing workflows alongside the new possibilities that AI brings to the table.
Opening the World of AI with AMD EPYC™ Processors
AMD EPYC™ processors are trusted to power a third of the world’s servers, and for good reason.1 Offering the world’s best data center CPU to enterprise customers, general-purpose AMD EPYC processors provide up to 96 core options that deliver up to 1.75x the performance per CPU watt, and 1.8x the performance in SPECrate® 2017_int_base, compared to competitor products.2
AMD high-performance CPUs provide a strong option for companies deploying AI workloads such as recommendation systems, machine-learning solutions, and other generative AI uses.
Leveraging proven, standard infrastructure, combined with upgrading to powerful AMD EPYC processors, helps customers to keep costs across server footprint, power, and initial expenditure low and can increase server performance and density - helping to put more use cases within reach and improve ROI.
Learn more about AMD EPYC™ processors.
Accelerating AI with AMD Instinct™ Accelerators
Many AI workloads and use cases require more than what AMD EPYC CPUs can do alone. Large language models continue to grow into the hundreds of billions - even trillions - of parameters.
Fortunately, AMD offers a range of workload engines to handle even the most demanding AI tasks. Extending the set of AI workloads managed effectively by AMD EPYC processors, comes the power of GPU acceleration, thanks to AMD Instinct™ accelerators. Where AMD server CPUs manage small to medium models and mixed workload inference deployments, AMD accelerators facilitate high-volume, real-time AI training, dedicated AI deployments, medium to large models, and large-scale real-time inference, accelerating AI results for enterprises looking to make the most of new technologies.
AMD offers a range of GPU solutions for various performance levels and form factors. The flagship AMD Instinct™ MI300X accelerator, powered by the AMD ROCm™ software stack, delivers a ~2.1x improvement on latency compared to the Nvidia H100 product running Llama2-70b chat, and an ~8x improvement compared to previous-generation products on Llama2-70b overall latency.3,4
With enterprise-ready open-source software, AMD ROCm™ underpinning AMD acceleration, companies can quickly be up and running on AI workloads, with support for ~400,000 Hugging Face models and deep engagements with other AI leaders including PyTorch and OpenAI.
Learn more about AMD Instinct™ Accelerators.
Offering Versatility with AMD Alveo™ Accelerators
Adaptable by design, AMD Alveo™ accelerators deliver real-time performance in the data center for a range of use cases. Customers can optimize the platform for the workload required, adapting to evolving algorithms and application requirements as needed.
With low latency for real-time applications and high throughput and efficiency, AMD Alveo accelerators are ideal for customers who want to ensure they have what they need for data analytics, HPC, media and infrastructure acceleration, and more.
Learn more about AMD Alveo™ Accelerators.
Bringing AI to Local Machines with AMD Ryzen™ Processors
AI doesn’t just operate in servers; it’s now on end-user devices, enhancing the way people work and elevating traditional processes, making work faster and easier - leaving teams free to focus on the bigger picture.
AMD Ryzen™ PRO processors are the world’s most advanced, ultra power-efficient processors for business desktops5 and deliver the first integrated AI engine in an x86 processor.6 This level of AI enablement from servers to client devices brings incredible capabilities that simply weren’t possible previously.
Learn more about AMD Ryzen™ PRO processors.
Completing the Picture with AMD Versal™ Adaptive SoCs for Edge AI
But AI isn’t just for PCs and servers, either. There are many applications where local AI processing on edge devices can have a huge impact on performance and safety.
In automotive, AI at the edge can enhance safety by allowing sensor data to be processed locally so decisions can be made in real time. You don’t want your autonomous vehicle to wait for data to be processed in the cloud to decide if it should apply the brakes to avoid an accident.
In healthcare, AI at the edge can enhance imaging equipment to accelerate diagnoses or provide real-time visualization to assist with surgeries. It can also help protect patient privacy by not having to send data through the cloud.
And in the industrial space, AI at the edge can help factory equipment run more safely and efficiently. AMD FPGAs and adaptive SoCs efficiently manage data pre-processing, inference, and post-processing for AI-driven and classic embedded systems, with its newest offering, the AMD Versal™ AI Edge Series Gen 2 adaptive SoC, handling all of these functions a single chip.
With AMD Versal products, customers can bring AI into every aspect of their business, making existing consumer and industrial environments smarter and enabled with AI.
Learn more about AMD Versal™ Adaptive SoCs.
The benefits of AI are pervasive, and it’s becoming part of the fabric of modern computing. Businesses need to adapt and adopt innovative technologies like those from AMD if they want to take advantage of the benefits.
If you’d like to learn more about AMD products and their support for the growing AI ecosystem, please contact your local representative, or visit AMD AI Solutions.
AMD Arena
Enhance your AMD product knowledge with training on AMD Ryzen™ PRO, AMD EPYC™, AMD Instinct™, and more.
Subscribe
Get monthly updates on AMD’s latest products, training resources, and Meet the Experts webinars.

Related Articles
Related Training Courses
Related Webinars
Footnotes
- Source: Mercury Research Sell-in Revenue Shipment Estimates, 2023 Q4
- SP5-013D: SPECrate®2017_int_base comparison based on published scores from www.spec.org as of 06/2/2023. Comparison of published 2P AMD EPYC 9654 (1800 SPECrate®2017_int_base, 720 Total TDP W, $23,610 total 1Ku, 192 Total Cores, 2.500 Perf/W, 0.076 Perf/CPU$, http://spec.org/cpu2017/results/res2023q2/cpu2017-20230424-36017.html) i is 1.80x the performance of published 2P Intel Xeon Platinum 8490H (1000 SPECrate®2017_int_base, 700 Total TDP W, $34,000 total 1Ku, 120 Total Cores, 1.429 Perf/W, 0.029 Perf/CPU$, http://spec.org/cpu2017/results/res2023q1/cpu2017-20230310-34562.html) [at 1.75x the performance/W] [at 2.59x the performance/CPU$]. Published 2P AMD EPYC 7763 (861 SPECrate®2017_int_base, 560 Total TDP W, $15,780 total 1Ku, 128 Total Cores, 1.538 Perf/W, 0.055 Perf/CPU$, http://spec.org/cpu2017/results/res2021q4/cpu2017-20211121-30148.html) is shown for reference at 0.86x the performance [at 1.08x the performance/W] [at 1.86x the performance/CPU$]. AMD 1Ku pricing and Intel ARK.intel.com specifications and pricing as of 6/1/23. SPEC®, SPEC CPU®, and SPECrate® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
- MI300-38A: Overall latency for text generation using the Llama2-70b chat model with vLLM comparison using custom docker container for each system based on AMD internal testing as of 12/14/2023. Sequence length of 2048 input tokens and 128 output tokens. vLLM tests used an enhanced version of the benchmark_latency.py script from the benchmarks directory of https://github.com/vllm-project/vllm. Enhancements were added to allow the use of input prompts with specific lengths. The vLLM version used for MI300X contains modifications that are not yet generally available outside of AMD. Configurations: 2P Intel Xeon Platinum 8480C CPU server with 8x AMD Instinct™ MI300X (192GB, 750W) GPUs, ROCm® 6.1.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2. vs. An Nvidia DGX H100 with 2x Intel Xeon Platinum 8480CL Processors, 8x Nvidia H100 (80GB, 700W) GPUs, CUDA 12.1., PyTorch 2.1.0., vLLM v.02.2.2 (most recent), Ubuntu 22.04. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
- MI300-33: Text generated with Llama2-70b chat using input sequence length of 4096 and 32 output token comparison using custom docker container for each system based on AMD internal testing as of 11/17/2023. Configurations: 2P Intel Xeon Platinum CPU server using 4x AMD Instinct™ MI300X (192GB, 750W) GPUs, ROCm® 6.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2. Vs. 2P AMD EPYC 7763 CPU server using 4x AMD Instinct™ MI250 (128 GB HBM2e, 560W) GPUs, ROCm® 5.4.3, PyTorch 2.0.0., HuggingFace Transformers 4.35.0, Ubuntu 22.04.6. 4 GPUs on each system was used in this test. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
- Based on a smaller node size of the AMD processor for an x86 platform, as of September 2023. GD-203.
- As of January 2024, AMD has the first available dedicated AI engine on a desktop PC processor, where 'dedicated AI engine' is defined as an AI engine that has no function other than to process AI inference models and is part of the x86 processor die. For detailed information, please check: https://www.amd.com/en/products/processors/consumer/ryzen-ai.html. PXD-03
- Source: Mercury Research Sell-in Revenue Shipment Estimates, 2023 Q4
- SP5-013D: SPECrate®2017_int_base comparison based on published scores from www.spec.org as of 06/2/2023. Comparison of published 2P AMD EPYC 9654 (1800 SPECrate®2017_int_base, 720 Total TDP W, $23,610 total 1Ku, 192 Total Cores, 2.500 Perf/W, 0.076 Perf/CPU$, http://spec.org/cpu2017/results/res2023q2/cpu2017-20230424-36017.html) i is 1.80x the performance of published 2P Intel Xeon Platinum 8490H (1000 SPECrate®2017_int_base, 700 Total TDP W, $34,000 total 1Ku, 120 Total Cores, 1.429 Perf/W, 0.029 Perf/CPU$, http://spec.org/cpu2017/results/res2023q1/cpu2017-20230310-34562.html) [at 1.75x the performance/W] [at 2.59x the performance/CPU$]. Published 2P AMD EPYC 7763 (861 SPECrate®2017_int_base, 560 Total TDP W, $15,780 total 1Ku, 128 Total Cores, 1.538 Perf/W, 0.055 Perf/CPU$, http://spec.org/cpu2017/results/res2021q4/cpu2017-20211121-30148.html) is shown for reference at 0.86x the performance [at 1.08x the performance/W] [at 1.86x the performance/CPU$]. AMD 1Ku pricing and Intel ARK.intel.com specifications and pricing as of 6/1/23. SPEC®, SPEC CPU®, and SPECrate® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
- MI300-38A: Overall latency for text generation using the Llama2-70b chat model with vLLM comparison using custom docker container for each system based on AMD internal testing as of 12/14/2023. Sequence length of 2048 input tokens and 128 output tokens. vLLM tests used an enhanced version of the benchmark_latency.py script from the benchmarks directory of https://github.com/vllm-project/vllm. Enhancements were added to allow the use of input prompts with specific lengths. The vLLM version used for MI300X contains modifications that are not yet generally available outside of AMD. Configurations: 2P Intel Xeon Platinum 8480C CPU server with 8x AMD Instinct™ MI300X (192GB, 750W) GPUs, ROCm® 6.1.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2. vs. An Nvidia DGX H100 with 2x Intel Xeon Platinum 8480CL Processors, 8x Nvidia H100 (80GB, 700W) GPUs, CUDA 12.1., PyTorch 2.1.0., vLLM v.02.2.2 (most recent), Ubuntu 22.04. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
- MI300-33: Text generated with Llama2-70b chat using input sequence length of 4096 and 32 output token comparison using custom docker container for each system based on AMD internal testing as of 11/17/2023. Configurations: 2P Intel Xeon Platinum CPU server using 4x AMD Instinct™ MI300X (192GB, 750W) GPUs, ROCm® 6.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2. Vs. 2P AMD EPYC 7763 CPU server using 4x AMD Instinct™ MI250 (128 GB HBM2e, 560W) GPUs, ROCm® 5.4.3, PyTorch 2.0.0., HuggingFace Transformers 4.35.0, Ubuntu 22.04.6. 4 GPUs on each system was used in this test. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
- Based on a smaller node size of the AMD processor for an x86 platform, as of September 2023. GD-203.
- As of January 2024, AMD has the first available dedicated AI engine on a desktop PC processor, where 'dedicated AI engine' is defined as an AI engine that has no function other than to process AI inference models and is part of the x86 processor die. For detailed information, please check: https://www.amd.com/en/products/processors/consumer/ryzen-ai.html. PXD-03