Version: AMD_ZEN_HPCG_2024-10-07

Description:

The High-Performance Conjugate Gradients (HPCG) Benchmark project has been created as a new metric for ranking HPC systems. It is based on a preconditioned conjugate gradient method, that solves Ax = b, where A is a sparse square matrix. The applied preconditioner is a multigrid v-cycle iteration, smoothed by a forward and backward Gauss-Seidel sweep.

This version is derived from The High-Performance Conjugate Gradients (HPCG) Benchmark (Revision: 3.1 - Date: March 28, 2019) and has been optimized to run on AMD EPYC CPUs.

Dependencies:

  • OpenMPI 4/5: The binaries were built against OpenMPI-5.0.3 and should run without issue if OpenMPI 5 or 4 is in the environment.
  • The binary was built on Red Hat® Enterprise Linux® 8.9 and tested on Red Hat® Enterprise Linux® 9, Ubuntu Linux 22.04.

Recommended Settings:

  • Boost: ON
  • Transparent Hugepages: always
  • SMT: OFF
  • NPS: 4
  • Determinism: Power

How to Run:

  • Ensure OpenMPI is installed and loaded in your environment.
  • Place the supplied hpcg.dat file in the same directory as the AMD Zen HPCG binaries. Modify hpcg.dat as per your requirement.
    • By default, hpcg.dat will define a very small problem, where the 2nd line represents values of nx, ny, and nz, respectively and the 3rd line represents the runtime. To ensure valid benchmark runs, the problem size should be chosen such that the benchmark utilizes at least 1/4th of the total available main memory, and the runtime should be a minimum of 1800 seconds.
    • Alternatively, you may pass these arguments in the command line
      Example:
      –nx=<value> –ny=<value> –nz=<value> –rt=<value>
      Note: These parameters will override the values set in hpcg.dat.
  • Example Run Command for Single Node
    • For a short run on AMD 3rd Generation EPYC™ CPU, Dual Socket with 64 Cores/socket and 512 GB RAM
      mpirun -np 32 --bind-to core --map-by ppr:2:l3cache:pe=4 -x OMP_NUM_THREADS=4 -x OMP_PROC_BIND=true -x OMP_PLACES=cores ./amd_hpcg --nx=192 --ny=192 --nz=192 --rt=60`
    • For a short run on AMD 4th Generation EPYC™ CPU, Dual Socket with 96 Cores/socket and 1.5TB RAM
      mpirun -np 96 --bind-to core --map-by ppr:4:l3cache:pe=2 -x OMP_NUM_THREADS=2 -x OMP_PROC_BIND=true -x OMP_PLACES=cores ./amd_hpcg --nx=192 --ny=192 --nz=192 --rt=60`
    • For a short run on AMD 5th Generation EPYC™ CPU, Dual Socket with 128 Cores/socket and 1.5TB RAM
      mpirun -np 128 --bind-to core --map-by ppr:4:l3cache:pe=2 -x OMP_NUM_THREADS=2 -x OMP_PROC_BIND=true -x OMP_PLACES=cores ./amd_hpcg --nx=192 --ny=192 --nz=192 --rt=60`