Run hpc_benchmark23.10 HPL with v100GPU

I run a 23.10HPL_BENCHMARK with v100。

this is nvidia-smi in docker.
I run it with below code

mpirun -np 1 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 ./ --dat HPL-1GPU.dat --no-multinode --cuda-compat

But I encounter errors like this

HPL-NVIDIA 23.10.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   92800 
NB     :    1024 
PMAP   : Column-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :    Left 
BCAST  :  2ringM 
DEPTH  :       1 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : no
ALIGN  : 8 double precision words


- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

HPL-NVIDIA ignores the following parameters from input file:
	* Broadcast parameters
	* Panel factorization parameters
	* Look-ahead value
	* L1 layout
	* U layout
	* Equilibration parameter
	* Memory alignment parameter

HPL-NVIDIA settings from environment variables:
	monitor_gpu from environment variable 0 
	warmup_end_prog from environment variable 5.0 
	test_loops from environment variable 1 
	hpl_cfg_cuda_vmm from environment variable 0 

Device info:
	Peak clock frequency 1380 MHz
	SM 70
	Number of SMs 80
	Total memory available 31.74 GB
	canUseHostPointerForRegisteredMem 1
	canMapHostMemory 1
[HPL TRACE] cuda_nvshmem_init: max=0.4351 (0) min=0.4351 (0)
[WARNING] Change Input N 92800 to 92160
[HPL TRACE] ncclCommInitRank: max=0.1208 (0) min=0.1208 (0)
[cfe5c217c9f7:133  :0:133] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:    133) ====
 0 0x0000000000042520 __sigaction()  ???:0
[cfe5c217c9f7:00133] *** Process received signal ***
[cfe5c217c9f7:00133] Signal: Segmentation fault (11)
[cfe5c217c9f7:00133] Signal code:  (-6)
[cfe5c217c9f7:00133] Failing at address: 0x85
[cfe5c217c9f7:00133] [ 0] /usr/lib/x86_64-linux-gnu/[0x7f4da9956520]
[cfe5c217c9f7:00133] *** End of error message ***
./ line 254:   133 Segmentation fault      (core dumped) ${NUMCMD} ${CPUBIND} ${MEMBIND} ${XHPL} ${DAT}
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[44010,1],0]
  Exit code:    139

It seem error happen in


Can someone help us?

I have also encountered this issue. If anyone could assist us with an answer, we would be immensely grateful.

Hello, would like to inquire about the highest version of the HPL benchmark test that is supported by the NVIDIA V100. I have attempted to run the latest versions, 23.10 and 23.05, but encountered various issues that prevented successful execution. I look forward to your response and guidance on this matter.

Hi, also want to know what is the highest version that supports nvidia V100?
Thanks so much!