NVSHMEM Compilling

i’m compilling my code using nvshmem .
I’m getting this error ::
“proxy.cpp:(.text+0xfa1): undefined reference to `cudaDeviceFlushGPUDirectRDMAWrites’”
Can someone please tell me which library to link exactly for this .
I am already linkning like this ::
“nvcc -DUSE_NVSHMEM -DParallel -std=c++14 -arch=sm_80 show_para.o main_hydro.o main_scalar.o main_mhd.o main_emhd.o linspace.o meshgrid.o normalize.o compute_p.o pressure.o ektk.o
spectral_setup.o time_advance1.o time_advance2.o time_advance3.o time_advance4.o time_advance5.o set_anisotropy.o Destruct.o reality.o kernel_ektk.o
time_advance6.o time_advance7.o time_advance8.o glob.o helicity.o univ.o scalar_field.o comm_s.o e_div.o modes.o force1.o force2.o pre_compute_config.o FFT.o test_fft.o
force3.o force4.o nlin1.o nlin2.o nlin3.o nlin4.o print_results.o vector_field.o dealias.o io_1.o io_2.o io_3.o io_4.o io_5.o main.o -Xptxas -O3
-L /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.0//lib64/,/opt/cray/pe/hdf5-parallel/1.12.2.3/NVIDIA/20.7/lib/,/usr/lib/python3.6m/config-3.6m-x86_64-linux-gnu/,/opt/cray/pe/mpich/8.1.25/ofi/nvidia/20.7//lib/,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/nvshmem/lib,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/nccl/lib,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.0//lib64,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib/,/opt/cray/pe/mpich/8.1.25/gtl/lib -lcufft -lhdf5 -rdc=true -lpython3.6m -lmpi -lcuda -lcudart -lnvshmem -lnvidia-ml -lmpi_nvidia -lmpi_gtl_cuda -o TARANG_NVSHMEM”

From this path " /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.0" it looks like the version of CUDA you’re compiling with doesn’t support the cudaDeviceFlushGPUDirectRDMAWrites API. HPC SDK 22.11 should also include CUDA 11.8 [1]. Can you try building with that version of CUDA?

[1] Release Notes Version 22.11

Thanks man!
Issue is resolved.
But there is another issue regarding it that came up .
Nvshmem that come in HPC_SDK is built with openmpi.
I was trying to built it on ALCF polaris cluster.
There is only mpich (intel mpi ) or mvapich2. So can you tell me how can i use NVSHMEM with that mpich.

The source code to NVSHMEM’s MPI bootstrap plugin is included with HPC SDK for scenarios like this. You can compile just this bootstrap plugin as a .so file and set NVSHMEM_BOOSTRAP_PLUGIN=/path/to/my_mpi_plugin.so to direct NVSHMEM to load your bootstrap plugin.

i built the bootstrap from provided file . But when i run the code the error is
“src/topo/topo.cpp:68: [GPU 3] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
src/topo/topo.cpp:68: [GPU 2] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
src/topo/topo.cpp:68: [GPU 1] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
MPICH ERROR [Rank 0] [job id 856f0066-5f09-464e-abb4-f43c0a029cdb] [Tue Nov 28 10:54:12 2023] [x3005c0s19b1n0] - Abort(139008270) (rank 0 in comm 0): Fatal error in PMPI_Alltoall: Message truncated, error stack:
PMPI_Alltoall(427)…: MPI_Alltoall(sbuf=0xa5a2180, scount=16, MPI_BYTE, rbuf=0xa5a2130, rcount=16, datatype=MPI_BYTE, comm=comm=0x84000002) failed
MPIR_Alltoall_impl(259)…:
MPIR_Alltoall_intra_auto(170)…: Failure during collective
MPIR_Alltoall_intra_auto(166)…:
MPIR_Alltoall_intra_pairwise(95):
progress_recv(174)…: Message from rank 3 and tag 9 truncated; 16 bytes received but buffer size is 40
MPIR_Alltoall_intra_pairwise(95):
MPIDIG_handle_unexp_mrecv(79)…: Message from rank 2 and tag 9 truncated; 16 bytes received but buffer size is 40
MPIR_Alltoall_intra_pairwise(95):
MPIDIG_handle_unexp_mrecv(79)…: Message from rank 3 and tag 9 truncated; 16 bytes received but buffer size is 40”

While if i run the same code with same MPI without nvshmem it is able to do GPU-GPU direct data transfer using NVLINK

Can anyone please provide the solution

This looks like a possible incompatibility between the MPI library used to build the bootstrap and the MPI library that’s being used at runtime. Can you please confirm that the same MPI library is being used/linked both for building and running the bootstrap?