HPL check displays nan in Container 23.10

Hi all,

I’m using the image from nvcr.io/nvidia/hpc-benchmarks:23.10 to perform HPL benchmarks on A100 GPUs. With previous versions of the container, I have not encountered any problems. However, with this one, I find that the consistency check is as follows:

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=             -nan ...... FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 0.0000000000000000
||A||_oo . . . . . . . . . . . . . . . . . . . = 0.0000000000000000
||x||_oo . . . . . . . . . . . . . . . . . . . = 0.0000000000000000
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.0000000000000000

To verify, I have used the same HPL.dat file with version 23.5 of that container. It yields

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0004388 ...... PASSED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 0.0000000000874274
||A||_oo . . . . . . . . . . . . . . . . . . . = 15009.3686127988985390
||x||_oo . . . . . . . . . . . . . . . . . . . = 2.0132378472643913
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.4999926335987157

The HPL results themselves are okay. My first guess is that there might be something wrong with the display of the residuals.

More information about the workflow:

  • I use apptainer / singularity to launch the container in interactive mode:
apptainer shell --nv --writable hpc-benchmarks\:23.5.sif

I have also used apptainer to pull the image.

  • Inside the container, I switch to /workspace and run the benchmark as
mpirun -np 1 ./hpl.sh --dat HPL.dat 
  • The content of HPL.dat (basically HPL-1GPU.dat from /workspace/hpl-linux-x86_64/sample-dat with N = 60000):
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
60000  Ns
1            # of NBs
1024         NBs
1            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 8          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
3 2          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1 0          DEPTHs (>=0)
1            SWAP (0=bin-exch,1=long,2=mix)
192          swapping threshold
1            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
0            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)