VERSION: AMD Zen HPL AVX2 2023-01
DEPENDENCIES:
- This binary executable was built with AVX256 support, and will only run properly on systems that support AVX256 instructions
- Specifically: AMD “Zen3”-based processors such as the AMD 3rd Generation EPYC™ CPUs.
- The binary will NOT run on AMD “Zen4”-based or prior processors.
- The binary was built on Red Hat® Enterprise Linux® 8.6 and runs without issue on Red Hat® Enterprise Linux® 9 and UBUNTU® 22.04.
- OpenMPI 4: This binary was built against OpenMPI 4.1.4 and should run without issue as long as OpenMPI 4 is in the PATH.
RECOMMENDED SETTINGS:
- Boost: ON
- Transparent Hugepages: always
- SMT: OFF
- NPS: 4
- Determinism: Power
HOW TO RUN:
- Modify the supplied HPL.dat file according to the community tuning guide.
- By default, this will run a very small problem. For peak performance, a larger value for ‘N’ should be chosen such that the memory use will be close to 90% of system memory. Ideally, the N value will be a multiple of the NB value.
- Other than selection of ‘N’, the supplied file is a reasonable starting place for most AMD “Zen3”-based systems.
- Peak single-node performance is typically found with 1 MPI rank per socket and as many threads per socket as there are physical cores. This corresponds to P = 1, Q = 2.
- AMD Zen HPL introduces a new hybrid panel broadcast mechanism, which can be enabled by setting BCAST = 7.
- Check the run.sh script.
- By default, it sets the number of threads per rank to be the number of cores per socket and the number of MPI ranks to 2.
- If the HPL.dat file is changed to a different P & Q, these will need to be adjusted accordingly.
- NBMIN should be equal to 30 for AMD “Zen3” EPYC™ processors.
- (Optional) Clean up the system and set various system parameters.
As root, invoke reset-system.sh, which will:- Clean up system memory
- Tune for Transparent Hugepages
- Disable NUMA balancing
- Set the CPU governor
- Enable CPU boost
- Invoke ./run.sh.