Hi all,
I am getting a run-time error using the openMPI library included in the 26.3 NVIDIA HPC SDK.
My system reports the following versions on the compiler and library:
$ nvc --version
nvc 26.3-0 64-bit target on x86-64 Linux -tp haswell
NVIDIA Compilers and Tools
Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
$ mpicc --version
nvc 26.3-0 64-bit target on x86-64 Linux -tp haswell
NVIDIA Compilers and Tools
Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
$ mpiexec --version
mpiexec (OpenRTE) 4.1.9a1
Below is a copy of the error I am getting:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mpiexec -n 4 parallelSpmv
parallelSpmv: symbol lookup error: /opt/nvidia.26_3/hpc_sdk/Linux_x86_64/26.3/comm_libs/13.1/hpcx/hpcx-2.25.1/ucx/mt/lib/ucx/libuct_cuda.so.0: undefined symbol: cuGetProcAddress_v2
parallelSpmv: symbol lookup error: /opt/nvidia.26_3/hpc_sdk/Linux_x86_64/26.3/comm_libs/13.1/hpcx/hpcx-2.25.1/ucx/mt/lib/ucx/libuct_cuda.so.0: undefined symbol: cuGetProcAddress_v2
parallelSpmv: symbol lookup error: /opt/nvidia.26_3/hpc_sdk/Linux_x86_64/26.3/comm_libs/13.1/hpcx/hpcx-2.25.1/ucx/mt/lib/ucx/libuct_cuda.so.0: undefined symbol: cuGetProcAddress_v2
parallelSpmv: symbol lookup error: /opt/nvidia.26_3/hpc_sdk/Linux_x86_64/26.3/comm_libs/13.1/hpcx/hpcx-2.25.1/ucx/mt/lib/ucx/libuct_cuda.so.0: undefined symbol: cuGetProcAddress_v2
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[59678,1],1]
Exit code: 127
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Before installing the 26.3 NVIDIA HPC SDK I was using the 25.7 NVIDIA HPC SDK without any problem.
Any idea?
I really appreciate any help you can provide.
Thanks.