cuFFTMp c2c sample isnt working with NVIDIA-HPC SDK 24.3

Hi @MatColgrove

From our discussion few days back, I did manage to run cufftmp samples successfully with nvidia hpc sdk/24.3 on a cluster but today when I ran those same samples with same environment variables , I came across an error as:

pushkar@node3:~/CUDALibrarySamples/cuFFTMp/samples/c2c$ make run
LD_LIBRARY_PATH="/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/extras/qd/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/extras/CUPTI/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/lib:/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib:/lib64/stubs" mpirun -oversubscribe -n 2 cufftmp_c2c
mpirun-Error-MPI matching the current driver version (12.4) or a supported older version (11.0 or 12.3) was not installed with this HPC SDK. When using NVHPC_CUDA_HOME with MPI you should also set NVHPC_COMM_LIBS_HOME.
make: *** [Makefile:18: run] Error 1

Any help in this regard will be beneficial as all the devloped codes which run on cufftmp have stopped working with multiple gpus.

Many Thanks
Pushkar

Hi Pushkar,

I’m guessing the CUDA Driver was updated?

How our mpirun works, is that we have to build MPI for particular CUDA version. Since this would be a bit of pain for users to have to switch between these builds, our top level mpirun is really a script which then invokes the correct build based on the CUDA driver version. Though since 24.3 didn’t ship with CUDA 12.4, the “trampoline” can’t find the correct build.

You can try setting your PATH directly to the mpirun which would be one of the following depending on which MPI you’re using.

For OpenMPI4:
PATH=/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/bin:$PATH

For HPC-X (default):
PATH=/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/hpcx-2.17.1/ompi/bin:$PATH

Alternatively, we did just release 24.5 last week, which does include support for CUDA 12.4 so updating to 24.5 should work as well: NVIDIA HPC SDK Current Release Downloads | NVIDIA Developer

-Mat

1 Like

Thanks alot @MatColgrove for your quick reply …it seems that cufftmp examples are now working …

Hi @MatColgrove

I was just trying to rerun the cufftmp examples on A100 GPU with nvhpc 24.3 but this time I ended up with error message as -

rm -rf cufftmp_c2c
/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/bin/nvcc cufftmp_c2c.cu -o cufftmp_c2c -std=c++17 --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_90,code=sm_90 -I/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/include/cufftmp -I/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/include -I/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/include -lcuda -L/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64 -L/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib  -lcufftMp -lnvshmem_device -lnvshmem_host -L/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/lib -lmpi -L/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.0/lib64/stubs -lnvidia-ml
LD_LIBRARY_PATH="/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/compilers/extras/qd/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/extras/CUPTI/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/hpcx/latest/ompi/lib:/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.0/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.0/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.0/lib64/stubs:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/hpcx/latest/ompi/lib:/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.0/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.0/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.0/lib64/stubs:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/lib:/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.3/lib64/stubs:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/lib:/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.3/lib64/stubs" mpirun --oversubscribe -n 1 -mca coll_hcoll_enable 0 cufftmp_c2c
Hello from rank 0/1 using GPU 0 transform of size 8 x 8 x 8, local size 8 x 8 x 8
/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/host/init/init.cu:nvshmemi_check_state_and_init:933: nvshmem initialization failed, exiting

/dvs/p4/build/sw/rel/gpgpu/toolkit/r12.0/main_nvshmem/src/util/cs.cpp:23: non-zero status: 16: Cannot allocate memory, exiting... mutex destroy failed

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[41286,1],0]
  Exit code:    255
--------------------------------------------------------------------------
make: *** [Makefile:19: run] Error 255

Any help in this regard will prove beneficial.

Many Thanks
Pushkar

Hi Pushkar,

I’ve not seen this error before so just guessing, but if you look at your LD_LIBRARY_PATH, the CUDA 12.0 library paths are being used with the MPI that uses CUDA 12.3. Maybe this version mismatch is causing the problem?

Also, the paths include non-existent directories, directories without libraries, and “stubs” directories. You do have “lib64” included before the stubs so the real CUDA driver (libcuda.so) will get loaded, but I’d remove these. The stub libraries should only be used when linking on a system without a GPU and you’re using the driver API instead of the CUDA Runtime API.

Try reducing the LD_LIBRARY_PATH down to:

/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/hpcx/latest/ompi/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nccl/lib:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/math_libs/lib64:/export/apps/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/nvshmem/lib

-Mat

Thanks @MatColgrove for your quick response and I did manage to resolve the above error by linking to proper cufft library povided in nvhpc