KTH Royal Institute of Technology
KTH enhanced its HPC services with an HPE Cray EX cluster powered by AMD EPYC processors and AMD Instinct™ GPUs. Research areas include climate prediction, sustainable sea transport, and biomolecular modeling.
Modern data centers are continuously striving for greater efficiency and scalability while delivering increased performance and security. With the rise of artificial intelligence (AI) and generative AI workloads, global electricity consumption trends show a collective trajectory to consume more energy than the market can support within the next two decades.1
The need for innovative energy solutions is becoming increasingly important – perhaps nowhere more so than in the data center. At AMD, we recognize our important role in addressing these critical priorities. We are focused on accelerating server energy efficiency, supporting infrastructure consolidation, lowering data center total cost of ownership (TCO), and delivering high-performance computing (HPC) to help tackle some of the world’s toughest challenges.
Our goal is to deliver a 30x increase in energy efficiency for AMD processors and accelerators powering servers for AI-training and HPC from 2020-2025.2
These important and growing computing segments have some of the most demanding workloads. This goal represents more than a 2.5x acceleration of the industry trends from 2015-2020 as measured by the worldwide energy consumption for these computing segments.3
Even with continued advances in process manufacturing, the slowdown in Moore’s Law is clear. Energy efficiency gains from process node advances are now smaller and less frequent. Therefore, a larger fraction of improvements needs to come from silicon architecture and packaging innovations in addition to expected gains from silicon process technology.
As of late-2024, we have achieved a 28.3x 4 improvement in energy efficiency for AMD accelerated compute nodes from the 2020 baseline using a configuration of four AMD Instinct™ MI300X GPUs and one AMD EPYC™ 5th Gen CPU. Our progress report utilizes a measurement methodology2 validated by renowned compute energy efficiency researcher and author, Dr. Jonathan Koomey.
*For illustrative purposes. See data table in footnote.
Our 30x energy efficiency goal equates to a 97% reduction in energy use per computation from 2020-2025. If all AI and HPC server nodes globally were to make similar gains, billions of kilowatt-hours of electricity could be saved in 2025 relative to baseline trends.
Using energy efficient servers can mean fewer physical servers are needed to meet computing demands, which can have a cascading effect of avoided environmental impacts – less raw materials, manufacturing, shipping, energy use, and data center space.
AMD powered servers can meet performance demands with fewer physical servers, which can result in a reduced data center footprint and associated energy use and GHG emissions. For example, achieving the same amount of compute (10,000 units of integer performance) is estimated to require 11 Intel servers (2P 60 core Xeon Platinum 8490H CPUs) or six AMD servers (2P 96 core 9654 EPYC CPUs).5 The difference of five servers amounts to estimated operational savings of up to 45% less power, which over a three-year period can avoid up to 107 metric tons of CO2e and up to $37,700 in energy costs.
Another assessment (Feb 2024) found that 15 servers (popular in the 2019-2021 timeframe) could be replaced with three 4th Gen AMD EPYC servers. Consolidations enabled by higher performance and energy efficient solutions can have a cascading effect of avoided environmental impacts – less raw materials, manufacturing, shipping, energy use, and data center space.
The AMD EPYC Bare Metal and Greenhouse Gas Emissions TCO Estimation Tool illustrates how AMD-powered servers can help reduce GHG emissions. You can find this tool on the EPYC Tools home page.
To make the goal particularly relevant to worldwide energy use, AMD worked with Koomey Analytics to assess available research and data that includes segment-specific datacenter power utilization effectiveness (PUE), including GPU HPC and machine learning (ML) installations. The AMD CPU socket and GPU node power consumptions incorporate segment-specific utilization (active vs. idle) percentages and are multiplied by PUE to determine actual total energy use for calculation of the performance per watt.
The energy consumption baseline uses the same industry energy per operation improvement rates as were observed from 2015-2020, with this rate of change extrapolated to 2025. The AMD goal trend line (Table 1) shows the exponential improvements needed to hit the goal of 30-fold efficiency improvements by 2025. The actual AMD products released (Table 2) are the source of the efficiency improvements shown for AMD goal status in Table 1.
The measure of energy per operation improvement in each segment from 2020-2025 is weighted by the projected worldwide volumes (as per IDC - Q1 2021 Tracker Hyperion - Q4 2020 Tracker, Hyperion HPC Market Analysis, April ’21). Translating these volumes to the ML training and HPC markets results in node volumes as per Table 3 below. These volumes are then multiplied by the Typical Energy Consumption (TEC) of the respective computing segment in 2025 (Table 4) to arrive at a meaningful aggregate metric of actual energy usage improvement worldwide.
Table 1: Summary efficiency data projected to 2025
|
2020 |
2021 |
2022 |
2023 |
2024 |
2025 |
Goal Trend Line |
1.00 |
1.97 |
3.98 |
7.70 |
15.20 |
30.00 |
AMD Goal Status (energy-weighted performance / watt) |
1.00 |
3.90 |
6.79 |
13.49 |
28.29 |
|
Table 2: AMD Products
2020 |
2021 |
2022 |
2023 |
2024 |
2025 |
EPYC Gen 1 CPU + M50 GPU |
EPYC Gen 2 CPU + MI100 GPU |
EPYC Gen 3 CPU + MI250 GPU |
MI300A APU (4th Gen AMD EPYC™ CPU with AMD CDNA™ 3 Compute Units) |
EPYC Gen 5 CPU + MI300X GPU |
|
*AMD Products are supported by the latest software, including AMD ROCm.
Table 3: Volume Projections (millions/yr)
|
2020 |
2021 |
2022 |
2023 |
2024 |
2025 |
HPC GPU nodes sold |
0.05 |
0.06 |
0.07 |
0.09 |
0.10 |
0.12 |
ML GPU nodes sold |
0.09 |
0.10 |
0.12 |
0.14 |
0.17 |
0.20 |
Table 4: Base case 2025 electricity consumption of products sold in that year, for weighting efficiency indices (TWh/year)
|
2025 |
Base HPC |
4.49 |
Base ML |
29.79 |
Total Base |
34.28 |
* Estimated values for 2025 on worldwide energy use are updated annually as HPC and ML compute node capabilities evolve from our original outlook, including with the growth of AI increasing the weighting for ML performance.