NVPROF- Error: incompatible CUDA driver version.

Hi Team,

I’m getting this below error When I run the application as

mpirun ./main.exe
with LSF settings to run just a single MPI process, the output is
Rank=0 Size=1 - ciao!
Rank=0 Size=1 - Pointer(A)=0x146adf20
Rank=0 Size=1 - going to GPU
Rank=0 Size=1 - back from GPU
Rank=0 Size=1 - adieu!
Next I try to run under the profiler, that is
mpirun nvprof ./main.exe
and I get
FATAL ERROR: dlsym PAMI_CUDA_RegisterPAMIContexts: ./main.exe: undefined symbol: PAMI_CUDA_RegisterPAMIContexts
[p10a11:134542] Error: common_pami.c:1019 - ompi_common_pami_init() Unable to create PAMI client (rc=1)

No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

Host: p10a11
Framework: pml

[p10a11:134542] PML pami cannot be selected
======== Error: Application returned non-zero code 1
so I figured maybe it was because I left the -gpu out, that is I should run with
mpirun -gpu nvprof ./main.exe
but this gives what looks like the same error
FATAL ERROR: dlsym PAMI_CUDA_RegisterPAMIContexts: ./main.exe: undefined symbol: PAMI_CUDA_RegisterPAMIContexts
[p10a07:159447] Error: common_pami.c:1019 - ompi_common_pami_init() Unable to create PAMI client (rc=1)

No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

Host: p10a07
Framework: pml

[p10a07:159447] PML pami cannot be selected
So maybe I can try with MXM instead,
mpirun -mxm nvprof ./main.exe
That gives us
[1546632219.717627] [p10a05:108539:0] mxm.c:196 MXM WARN The ‘ulimit -s’ on the system is set to ‘unlimited’. This may have negative performance implications. Please set the stack size to the default value (10240)
FATAL ERROR: dlsym PAMI_CUDA_RegisterPAMIContexts: ./main.exe: undefined symbol: PAMI_CUDA_RegisterPAMIContexts
Rank=0 Size=1 - ciao!
Rank=0 Size=1 - Pointer(A)=0x32895420
Rank=0 Size=1 - going to GPU
==108539== NVPROF is profiling process 108539, command: ./main.exe
======== Profiling result:
No kernels were profiled.
No API activities were profiled.
======== Error: incompatible CUDA driver version.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[33186,1],0]
Exit code: 19

OS - redhat7.5-alternate, architecture - ppc64le, nvidia driver 410.79 and cuda 10 installed on all machines.

One of our application engineers has looked at another aspect of this problem: the failure of Spectrum MPI’s PAMI libraries when profiling with nvprof. Below is his analysis.

The problem is that ‘nvprof’ is overriding the LD_PRELOAD that we set for the PAMI cudahooks library, which PAMI needs for correctness when handling CUDA buffers.

[b6p056zc@p10a11 nvprof-issue] mpirun -gpu env | grep LD_PRELOAD
OMPI_LD_PRELOAD_POSTPEND_DISTRO=/gpfs/gpfs_gl4_16mb/smpi/10.2.0.9/lib/libpami_cudahook.so
OMPI_MCA_mca_base_env_list_distro=MPI_ROOT,OPAL_PREFIX,OPAL_LIBDIR,PMIX_INSTALL_PREFIX,HWLOC_PLUGINS_PATH,PAMI_DISABLE_IPC,PAMI_IBV_DISABLE_RRMW,HCOLL_ALLREDUCE_ZCOPY_TUNE,OMPI_LD_PRELOAD_POSTPEND_DISTRO,LD_LIBRARY_PATH
LD_PRELOAD=/gpfs/gpfs_gl4_16mb/smpi/10.2.0.9/lib/libpami_cudahook.so
[b6p056zc@p10a11 nvprof-issue] mpirun -gpu nvprof env | grep LD_PRELOAD
OMPI_LD_PRELOAD_POSTPEND_DISTRO=/gpfs/gpfs_gl4_16mb/smpi/10.2.0.9/lib/libpami_cudahook.so
OMPI_MCA_mca_base_env_list_distro=MPI_ROOT,OPAL_PREFIX,OPAL_LIBDIR,PMIX_INSTALL_PREFIX,HWLOC_PLUGINS_PATH,PAMI_DISABLE_IPC,PAMI_IBV_DISABLE_RRMW,HCOLL_ALLREDUCE_ZCOPY_TUNE,OMPI_LD_PRELOAD_POSTPEND_DISTRO,LD_LIBRARY_PATH
LD_PRELOAD=libaccinj64.so.10.0
======== Warning: No profile data collected.

Since PAMI detects that the libpami_cudahook.so is not loaded, it’s failing out with the message you are seeing.

I’ve been tinkering and cannot seem to get it to work without wrapping the application. Here is my wrapper script

#=============================== cat wrap-me.sh
#!/bin/bash
if [[ “x” != “x$LD_PRELOAD” && “x” != “x$OMPI_LD_PRELOAD_POSTPEND_DISTRO” ]] ; then
if [ “$LD_PRELOAD” != “$OMPI_LD_PRELOAD_POSTPEND_DISTRO” ] ; then
export LD_PRELOAD="$OMPI_LD_PRELOAD_POSTPEND_DISTRO $LD_PRELOAD"
fi
fi
#echo "=====> "LD_PRELOAD exec @
#===============================

Using that I can get a bit further:
[b6p056zc@p10a11 nvprof-issue] mpirun -gpu nvprof $PWD/wrap-me.sh ./main.exe
=====> /gpfs/gpfs_gl4_16mb/smpi/10.2.0.9/lib/libpami_cudahook.so libaccinj64.so.10.0
Rank=0 Size=1 - ciao!
Rank=0 Size=1 - Pointer(A)=0x34900610
Rank=0 Size=1 - going to GPU
==117570== NVPROF is profiling process 117570, command: ./main.exe
======== Profiling result:
No kernels were profiled.
No API activities were profiled.
======== Error: incompatible CUDA driver version.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[55998,1],0]
Exit code: 19

I suspect that the CUDA driver version error is maybe something to do with the binary? You can also try reordering the LD_PRELOAD in the script to put 'nvprof’s first:
export LD_PRELOAD="$LD_PRELOAD $OMPI_LD_PRELOAD_POSTPEND_DISTRO"

Also verified this link and tried the root authority by giving sudo privileges, still getting the same error.

https://devtalk.nvidia.com/default/topic/1043228/?comment=5291984

/usr/local/cuda-10.0/bin/nvprof
[1546963843.411570] [p10a11:162153:0] mxm.c:196 MXM WARN The ‘ulimit -s’ on the system is set to ‘unlimited’. This may have negative performance implications. Please set the stack size to the default value (10240)
FATAL ERROR: dlsym PAMI_CUDA_RegisterPAMIContexts: ./main.exe: undefined symbol: PAMI_CUDA_RegisterPAMIContexts
Rank=0 Size=1 - ciao!
Rank=0 Size=1 - Pointer(A)=0x2b165420
Rank=0 Size=1 - going to GPU
==162153== NVPROF is profiling process 162153, command: ./main.exe
======== Error: incompatible CUDA driver version.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[26814,1],0]
Exit code: 19

[root@p10a73 OpenACCMPIProfilingTest]# ’