I run docker with the command as follows
docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/hpc-benchmarks:23.10
At the directory root@931eb37487ed:/workspace# cd /hpl-linux-aarch64-gpu
I run HPL with the command as follows
mpirun -n 2 ./hpl-aarch64-gpu.sh --cpu-affinity 0-39:40-79 --gpu-affinity 0:1 --dat ./sample-dat/HPL-2GPUs.dat
but I get the error as follows
================================================================================
HPL-NVIDIA 23.10.0 – NVIDIA accelerated HPL benchmark – NVIDIA
HPLinpack 2.1 – High-Performance Linpack benchmark – October 26, 2012
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 136608
NB : 1024
PMAP : Column-major process mapping
P : 2
Q : 1
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Left
BCAST : 2ringM
DEPTH : 1
SWAP : Spread-roll (long)
L1 : no-transposed form
U : transposed form
EQUIL : no
ALIGN : 8 double precision words
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
HPL-NVIDIA ignores the following parameters from input file:
* Broadcast parameters
* Panel factorization parameters
* Look-ahead value
* L1 layout
* U layout
* Equilibration parameter
* Memory alignment parameter
HPL-NVIDIA settings from environment variables:
monitor_gpu from environment variable 0
warmup_end_prog from environment variable 5.0
test_loops from environment variable 1
hpl_cfg_cuda_vmm from environment variable 0
Device info:
Peak clock frequency 1410 MHz
SM 80
Number of SMs 108
Total memory available 39.39 GB
canUseHostPointerForRegisteredMem 1
canMapHostMemory 1
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/transport/transport.cpp:nvshmemi_transport_init:215: init failed for remote transport: ibrc
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/topo/topo.cpp:420: [GPU 1] Peer GPU 0 is not accessible, exiting …
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:843: non-zero status: 3 building transport map failed
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/transport/transport.cpp:nvshmemi_transport_init:215: init failed for remote transport: ibrc
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/topo/topo.cpp:420: [GPU 0] Peer GPU 1 is not accessible, exiting …
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:843: non-zero status: 3 building transport map failed
[HPL TRACE] cuda_nvshmem_init: max=0.0665 (0) min=0.0648 (1)
[WARNING] Change Input N 136608 to 136192
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/transport/transport.cpp:nvshmemi_transport_init:215: init failed for remote transport: ibrc
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/topo/topo.cpp:420: [GPU 1] Peer GPU 0 is not accessible, exiting …
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:843: non-zero status: 3 building transport map failed
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:nvshmemi_check_state_and_init:933: nvshmem initialization failed, exiting
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/util/cs.cpp:23: non-zero status: 16: No such file or directory, exiting… mutex destroy failed
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/transport/transport.cpp:nvshmemi_transport_init:215: init failed for remote transport: ibrc
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/topo/topo.cpp:420: [GPU 0] Peer GPU 1 is not accessible, exiting …
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:843: non-zero status: 3 building transport map failed
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/host/init/init.cu:nvshmemi_check_state_and_init:933: nvshmem initialization failed, exiting
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.2/main_nvshmem/src/util/cs.cpp:23: non-zero status: 16: No such file or directory, exiting… mutex destroy failed
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was: