Nvshmem error in docker HPL benchmark

Hi,

I’m trying to run the NVIDIA HPL benchmarks as explained in NVIDIA HPC-Benchmarks | NVIDIA NGC .
I’m trying to do this on top of a VM with a vGPU attached (MIG mode).
If I try to run hpl.sh, I get the following errors:

HPL-NVIDIA settings from environment variables:
--- DEVICE INFO ---
  Peak clock frequency: 1410 MHz
  SM version          : 80
  Number of SMs       : 42
-------------------
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/mem/mem.cpp:298: non-zero status: 801 cuMemGetAllocationGranularity failed 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:966: non-zero status: 7 nvshmem setup local heap failed 

[HPL TRACE] cuda_nvshmem_init: max=0.0414 (0) min=0.0414 (0)
[WARNING] Change Input N 92800 to 92160
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/mem/mem.cpp:298: non-zero status: 801 cuMemGetAllocationGranularity failed 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:966: non-zero status: 7 nvshmem setup local heap failed 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:nvshmemi_check_state_and_init:1062: nvshmem initialization failed, exiting 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/util/cs.cpp:23: non-zero status: 16: Resource temporarily unavailable, exiting... mutex destroy failed

This is how I launch the container:
sudo docker run --rm --runtime=nvidia --gpus all --shm-size=1g --privileged -i -t nvcr.io/nvidia/hpc-benchmarks:24.09 /bin/bash

This is how I launch the HPL benchmark:
./hpl.sh --dat hpl-linux-x86_64/sample-dat/HPL-1GPU.dat

Can you please help me? What is going wrong with nvshmem?

Kind regards,

I’m getting similar error, not sure what is wrong:

HPL-NVIDIA settings from environment variables:
--- DEVICE INFO ---
  Peak clock frequency: 1733 MHz
  SM version          : 61
  Number of SMs       : 15
-------------------
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/device/init/init_device.cu:nvshmemi_get_mem_handle:79: Unable to access device state. 500

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/device/init/init_device.cu:nvshmemi_get_mem_handle:85: Unable to access ibgda device state. 500

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:952: NULL value Unable to query pointer information.

[HPL TRACE] cuda_nvshmem_init: max=0.0018 (0) min=0.0018 (0)
[WARNING] Change Input N 92800 to 92160
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/device/init/init_device.cu:nvshmemi_get_mem_handle:79: Unable to access device state. 500

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/device/init/init_device.cu:nvshmemi_get_mem_handle:85: Unable to access ibgda device state. 500

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:952: NULL value Unable to query pointer information.

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:nvshmemi_check_state_and_init:1062: nvshmem initialization failed, exiting 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/util/cs.cpp:23: non-zero status: 16: Cannot allocate memory, exiting... mutex destroy failed