VERSION: AMD Zen HPL AVX2 2023-01

DEPENDENCIES:

  • This binary executable was built with AVX256 support, and will only run properly on systems that support AVX256 instructions
    • Specifically: AMD “Zen3”-based processors such as the AMD 3rd Generation EPYC™ CPUs.
    • The binary will NOT run on AMD “Zen4”-based or prior processors.
  • The binary was built on Red Hat® Enterprise Linux® 8.6 and runs without issue on Red Hat® Enterprise Linux® 9 and UBUNTU® 22.04.
  • OpenMPI 4: This binary was built against OpenMPI 4.1.4 and should run without issue as long as OpenMPI 4 is in the PATH.

RECOMMENDED SETTINGS:

  • Boost: ON
  • Transparent Hugepages: always
  • SMT: OFF
  • NPS: 4
  • Determinism: Power

HOW TO RUN:

  1. Modify the supplied HPL.dat file according to the community tuning guide.
    • By default, this will run a very small problem. For peak performance, a larger value for ‘N’ should be chosen such that the memory use will be close to 90% of system memory. Ideally, the N value will be a multiple of the NB value.
    • Other than selection of ‘N’, the supplied file is a reasonable starting place for most AMD “Zen3”-based systems.
    • Peak single-node performance is typically found with 1 MPI rank per socket and as many threads per socket as there are physical cores. This corresponds to P = 1, Q = 2.
    • AMD Zen HPL introduces a new hybrid panel broadcast mechanism, which can be enabled by setting BCAST = 7.
  2. Check the run.sh script.
    • By default, it sets the number of threads per rank to be the number of cores per socket and the number of MPI ranks to 2.
    • If the HPL.dat file is changed to a different P & Q, these will need to be adjusted accordingly.
    • NBMIN should be equal to 30 for AMD “Zen3” EPYC™ processors.
  3. (Optional) Clean up the system and set various system parameters.
    As root, invoke reset-system.sh, which will:
    • Clean up system memory
    • Tune for Transparent Hugepages
    • Disable NUMA balancing
    • Set the CPU governor
    • Enable CPU boost
  4. Invoke ./run.sh.