Peculiar Performance of H200 in HPL-MxP Benchmark

Hi,

We are encountering an very peculiar under-performance of H200nv_8 node in HPL-MxP benchmark test.

Since performance of other HPC benchmarks, e.g. HPL/HPCG/Gromacs/Lammps are according to published data, we are strongly believe that there might be a bug with the latest version of NGC HPC-Benchmarks container.

  1. Spec:

    • CPU: 2x Intel Xeon (8558)
    • GPU: 8x H200-SMX5
    • OS: CentOS 7
    • Kernel: 3.10.0-1160.el7.x86_64
  2. Topology:

  3. Steps to reproduce:

    singularity \
        run --nv \
        ./hpc-benchmarks_24.09.sif \
        mpirun \
            --np 1 \
            /workspace/hpl-mxp.sh \
               --gpu-affinity 0-23 \
               --cpu-affinity 0 \
               --mem-affinity 0 \
               --nprow 1 \
               --npcol 1 \
               --nporder 0 \
               --n 120000 \
               --nb 2048
    
    • Output:
     ****** HPL MxP Result    ****** 
    
    EPS           . . . . . . . . . . . . . . . . .          =    2.000000E-16
    Threshold     . . . . . . . . . . . . . . . . .          =    1.600000E+01
    ||Ax-b||_oo     . . . . . . . . . . . . . . . . .          =    6.577394E-14
    ||A   ||_oo     . . . . . . . . . . . . . . . . .          =    1.208605E+05
    ||x   ||_oo     . . . . . . . . . . . . . . . . .          =    1.674189E-05
    ||b   ||_oo     . . . . . . . . . . . . . . . . .          =    9.999956E-01
    ||Ax-b||_oo / (EPS * (||A||_oo * ||x||_oo + ||b||_oo) * N) =    9.064482E-04 ...... PASSED
    
    N = 120000, NB = 2048, NPROW = 1, NPCOL = 1, SLOPPY-TYPE = 2
       GFLOPS = 5.0041e+04, per GPU =   50041.11
    LU GFLOPS = 4.5437e+05, per GPU =  454368.84
    
    ****** HPL MxP Result    ****** 
    
  4. Other information:

    • The same v24.9 container gave 50 Tflops/s with HPL benchmark. Thus we can rule out the possibility of hardware issue with our H200s.
    • We did archived ~ 350 Tflops/s with GH200 node using similar parameters, and the H200 is expected to performance accordingly.

I am not aware of any dedicated forum for NGC. Please kindly move it to a appropriate one if you deems necessary.

Regards.