Run hpc_benchmark23.10 HPL with v100GPU

luoyuelight · November 25, 2023, 12:03pm

I run a 23.10HPL_BENCHMARK with v100。

this is nvidia-smi in docker.
I run it with below code

mpirun -np 1 -mca pml ucx --mca btl ^vader,tcp,openib,uct -x UCX_NET_DEVICES=mlx5_0:1 ./hpl.sh --dat HPL-1GPU.dat --no-multinode --cuda-compat

But I encounter errors like this

================================================================================
HPL-NVIDIA 23.10.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   92800 
NB     :    1024 
PMAP   : Column-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :    Left 
BCAST  :  2ringM 
DEPTH  :       1 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : no
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0


HPL-NVIDIA ignores the following parameters from input file:
	* Broadcast parameters
	* Panel factorization parameters
	* Look-ahead value
	* L1 layout
	* U layout
	* Equilibration parameter
	* Memory alignment parameter

HPL-NVIDIA settings from environment variables:
	monitor_gpu from environment variable 0 
	warmup_end_prog from environment variable 5.0 
	test_loops from environment variable 1 
	hpl_cfg_cuda_vmm from environment variable 0 

Device info:
	Peak clock frequency 1380 MHz
	SM 70
	Number of SMs 80
	Total memory available 31.74 GB
	canUseHostPointerForRegisteredMem 1
	canMapHostMemory 1
[HPL TRACE] cuda_nvshmem_init: max=0.4351 (0) min=0.4351 (0)
[WARNING] Change Input N 92800 to 92160
[HPL TRACE] ncclCommInitRank: max=0.1208 (0) min=0.1208 (0)
[cfe5c217c9f7:133  :0:133] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:    133) ====
 0 0x0000000000042520 __sigaction()  ???:0
=================================
[cfe5c217c9f7:00133] *** Process received signal ***
[cfe5c217c9f7:00133] Signal: Segmentation fault (11)
[cfe5c217c9f7:00133] Signal code:  (-6)
[cfe5c217c9f7:00133] Failing at address: 0x85
[cfe5c217c9f7:00133] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f4da9956520]
[cfe5c217c9f7:00133] *** End of error message ***
./hpl.sh: line 254:   133 Segmentation fault      (core dumped) ${NUMCMD} ${CPUBIND} ${MEMBIND} ${XHPL} ${DAT}
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[44010,1],0]
  Exit code:    139
--------------------------------------------------------------------------

It seem error happen in

cugetrfs_mp_init

Can someone help us?

user126785 · November 26, 2023, 1:49am

I have also encountered this issue. If anyone could assist us with an answer, we would be immensely grateful.

user126785 · November 26, 2023, 1:52am

Hello, would like to inquire about the highest version of the HPL benchmark test that is supported by the NVIDIA V100. I have attempted to run the latest versions, 23.10 and 23.05, but encountered various issues that prevented successful execution. I look forward to your response and guidance on this matter.

mendax1234 · January 25, 2024, 2:03am

Hi, also want to know what is the highest version that supports nvidia V100?
Thanks so much!

nakiri5500 · November 24, 2025, 10:14am

Hi, a bit late to this, but here’s what I found:

For a V100, you’ll need to use the 21.4-hpl image.

Newer versions don’t include the pre-compiled binary for the sm_70 architecture, so they won’t run.

I’ve documented the full compatibility list for various benchmarks (HPL, HPCG, etc.) and my methodology here:
https://github.com/nakiridaisuki/NVBMinfo

Topic		Replies	Views
Nvidia docker nvcr.io/nvidia/hpc-benchmarks:23.10 HPL running error at HPC ARM Developer-kit Container: HPC cuda	2	1502	February 22, 2024
Run HPL benckmark 23.3 on A800(80GB) GPU-Accelerated Libraries cuda	0	1251	April 20, 2023
Error while running NVIDIA HPL benchmark for H100 GPU-Accelerated Libraries	1	1435	April 2, 2024
Run HPL on 4x A100 CUDA Programming and Performance	3	3162	July 17, 2021
HPL CUDA Programming and Performance	11	42585	July 18, 2011
HPL benchmark on A100(40GB PCIe) GPU-Accelerated Libraries cuda	1	1455	May 8, 2022
hpl-2.0_FERMI_v15 ERROR: Error allocating scratch space 2048 MB on node rocket rank 0 device 0 CUDA Programming and Performance	0	733	February 21, 2019
HPL Benchmark for CUDA GPU-Accelerated Libraries	5	2130	November 27, 2020
HPL cuda acclerated binaries for Tesla V100 GPU-Accelerated Libraries	0	873	January 27, 2019
LinPack HPL to benchmark NVIDIA GPUs CUDA Programming and Performance	18	16696	March 8, 2018

Run hpc_benchmark23.10 HPL with v100GPU

Related topics