NVSHMEM Compilling

manver · November 25, 2023, 6:54am

i’m compilling my code using nvshmem .
I’m getting this error ::
“proxy.cpp:(.text+0xfa1): undefined reference to `cudaDeviceFlushGPUDirectRDMAWrites’”
Can someone please tell me which library to link exactly for this .
I am already linkning like this ::
“nvcc -DUSE_NVSHMEM -DParallel -std=c++14 -arch=sm_80 show_para.o main_hydro.o main_scalar.o main_mhd.o main_emhd.o linspace.o meshgrid.o normalize.o compute_p.o pressure.o ektk.o
spectral_setup.o time_advance1.o time_advance2.o time_advance3.o time_advance4.o time_advance5.o set_anisotropy.o Destruct.o reality.o kernel_ektk.o
time_advance6.o time_advance7.o time_advance8.o glob.o helicity.o univ.o scalar_field.o comm_s.o e_div.o modes.o force1.o force2.o pre_compute_config.o FFT.o test_fft.o
force3.o force4.o nlin1.o nlin2.o nlin3.o nlin4.o print_results.o vector_field.o dealias.o io_1.o io_2.o io_3.o io_4.o io_5.o main.o -Xptxas -O3
-L /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.0//lib64/,/opt/cray/pe/hdf5-parallel/1.12.2.3/NVIDIA/20.7/lib/,/usr/lib/python3.6m/config-3.6m-x86_64-linux-gnu/,/opt/cray/pe/mpich/8.1.25/ofi/nvidia/20.7//lib/,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/nvshmem/lib,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/nccl/lib,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.0//lib64,/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/lib/,/opt/cray/pe/mpich/8.1.25/gtl/lib -lcufft -lhdf5 -rdc=true -lpython3.6m -lmpi -lcuda -lcudart -lnvshmem -lnvidia-ml -lmpi_nvidia -lmpi_gtl_cuda -o TARANG_NVSHMEM”

jdinan · November 26, 2023, 9:24pm

From this path " /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/cuda/11.0" it looks like the version of CUDA you’re compiling with doesn’t support the cudaDeviceFlushGPUDirectRDMAWrites API. HPC SDK 22.11 should also include CUDA 11.8 [1]. Can you try building with that version of CUDA?

[1] Release Notes Version 22.11

manver · November 27, 2023, 7:25pm

Thanks man!
Issue is resolved.
But there is another issue regarding it that came up .
Nvshmem that come in HPC_SDK is built with openmpi.
I was trying to built it on ALCF polaris cluster.
There is only mpich (intel mpi ) or mvapich2. So can you tell me how can i use NVSHMEM with that mpich.

jdinan · November 27, 2023, 8:35pm

The source code to NVSHMEM’s MPI bootstrap plugin is included with HPC SDK for scenarios like this. You can compile just this bootstrap plugin as a .so file and set NVSHMEM_BOOSTRAP_PLUGIN=/path/to/my_mpi_plugin.so to direct NVSHMEM to load your bootstrap plugin.

manver · November 28, 2023, 10:59am

i built the bootstrap from provided file . But when i run the code the error is
“src/topo/topo.cpp:68: [GPU 3] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
src/topo/topo.cpp:68: [GPU 2] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
src/topo/topo.cpp:68: [GPU 1] Peer GPU 0 is not accessible, exiting …
src/init/init.cu:714: non-zero status: 3 building transport map failed
MPICH ERROR [Rank 0] [job id 856f0066-5f09-464e-abb4-f43c0a029cdb] [Tue Nov 28 10:54:12 2023] [x3005c0s19b1n0] - Abort(139008270) (rank 0 in comm 0): Fatal error in PMPI_Alltoall: Message truncated, error stack:
PMPI_Alltoall(427)…: MPI_Alltoall(sbuf=0xa5a2180, scount=16, MPI_BYTE, rbuf=0xa5a2130, rcount=16, datatype=MPI_BYTE, comm=comm=0x84000002) failed
MPIR_Alltoall_impl(259)…:
MPIR_Alltoall_intra_auto(170)…: Failure during collective
MPIR_Alltoall_intra_auto(166)…:
MPIR_Alltoall_intra_pairwise(95):
progress_recv(174)…: Message from rank 3 and tag 9 truncated; 16 bytes received but buffer size is 40
MPIR_Alltoall_intra_pairwise(95):
MPIDIG_handle_unexp_mrecv(79)…: Message from rank 2 and tag 9 truncated; 16 bytes received but buffer size is 40
MPIR_Alltoall_intra_pairwise(95):
MPIDIG_handle_unexp_mrecv(79)…: Message from rank 3 and tag 9 truncated; 16 bytes received but buffer size is 40”

While if i run the same code with same MPI without nvshmem it is able to do GPU-GPU direct data transfer using NVLINK

Can anyone please provide the solution

jdinan · January 2, 2024, 7:05pm

This looks like a possible incompatibility between the MPI library used to build the bootstrap and the MPI library that’s being used at runtime. Can you please confirm that the same MPI library is being used/linked both for building and running the bootstrap?

Topic		Replies	Views
Running Nvshmem from custom build bootstrap GPU-Accelerated Libraries nvshmem	0	518	November 30, 2023
Nvshmem_runtime_error GPU-Accelerated Libraries nvshmem	3	356	July 7, 2024
NVSHMEM fails to compile using nvcc GPU-Accelerated Libraries hw , cuda , kernel	4	206	July 24, 2024
Using NVSHMEM in Building Pytorch Operator GPU-Accelerated Libraries	7	1318	April 27, 2021
NVSHMEM runtime error GPU-Accelerated Libraries nvshmem	11	2061	August 16, 2022
Raise error when link nvshmem in my application Legacy PGI Compilers cuda , cudnn	13	1742	January 2, 2024
NVSHMEM PMI version mismatch GPU-Accelerated Libraries cuda	0	215	May 7, 2024
Failure in installation of nvshmem GPU-Accelerated Libraries cuda , nvshmem	5	535	March 13, 2024
NVSHMEM Installation undefined reference to `__sync_synchronize' GPU-Accelerated Libraries nvshmem	2	348	June 13, 2024
NVSHMEM on multi-node GPUs failed . My gpu is A5000 GPU-Accelerated Libraries nvshmem	5	1177	April 1, 2024

NVSHMEM Compilling

Related topics