Using NVSHMEM in Building Pytorch Operator

Daniel_Wong · April 19, 2021, 4:02pm

Hi, All

I want to build a Pytorch operator by using NVSHMEM.
Is there any way that I can do that? Because when we build a standalone NVSHMEM application written in pure C++ and CUDA C, we need to use nvshmrun -n 2 to run that application. While in Pytorch, how could we achieve the same goal?

Thanks!

alanger · April 23, 2021, 2:00pm

Daniel, NVSHMEM can be initialized using MPI as well. It can use the same bootstrap mechanism as you are using to run the MPI backend. The way to initialize NVSHMEM using MPI is shown here: NVIDIA OpenSHMEM Library (NVSHMEM) Documentation — NVSHMEM 2.6.0 documentation

If you are writing an NVSHMEM backend, you can use the above example code to initialize NVSHMEM.

Also, we are curious to know what communication primitives you are looking at (alltoall, allreduce, etc.)? And what is the use case or the project you are trying to use NVSHMEM for?

Daniel_Wong · April 23, 2021, 4:06pm

Thanks for your reply!

I am currently testing the potential usage of nvshmem in graph processing, which essentially is irregular memory access application. And I want to build a Pytorch operator for that just like the cuGraph.

I also encounter another problem when compiling my program on a HPC (DGX with 4 Tesla V100 32GB). Because I don’t have the root access for those servers, I have to put the openmpi on my user home directory /home/user/opemmpi and I use the command from the official website.

nvcc -rdc=true -ccbin g++ -arch=$NVCC_GENCODE \
                            -I$NVSHMEM_HOME/include \
                            -Iinclude \
                            src/app.cu \
                            -o app \
                            -L$NVSHMEM_HOME/lib \
                            -lnvshmem \
                            -lcuda\
                            -Xcompiler \
                            -pthread \
                            -L$MPI_HOME/lib \
                            -lmpi_cxx \
                            -lmpi

However, it always fails to link with mpi_cxx while it can successfully link with mpi, and it shows me like

/usr/bin/ld: cannot find -lmpi_cxx

Could you please help me with this?
Thanks a lot!

alanger · April 23, 2021, 4:19pm

It’s possible that the OpenMPI version you are using does not come with CPP bindings. Does the lib directory in openmpi installation have libmpi_cxx?

Daniel_Wong · April 23, 2021, 4:34pm

I am using the openmpi (openmpi-4.1.0.tar.gz) download from Open MPI: Version 4.1, I am not sure whether this version has the cxxlib to link, or maybe I should download the other two openmpi-4.1.0-1.src.rpm and openmpi-4.1.0.tar.bz2?

alanger · April 23, 2021, 4:41pm

Do you need cpp bindings? If not, you can just remove -lmpi_cxx. I think lately OpenMPI does not build (or maybe even support) them by default.

Daniel_Wong · April 23, 2021, 4:51pm

when I remove the -lmpi_cxx flag it shows me the compilation error like

undefined reference to `ompi_mpi_cxx_op_intercept'
undefined reference to `MPI::Comm::Comm()'

I think I put the right path for MPI_HOME.

alanger · April 27, 2021, 6:32pm

I am not sure why this error is coming if OpenMPI has not been built with CPP bindings. Can you try building OpenMPI with CPP bindings using the --enable-cxx option at configure time.

Topic		Replies	Views
Using NVSHMEM in Building Pytorch Operator Legacy PGI Compilers	2	628	April 19, 2021
Howto build OpenMPI with nvhpc/24.1 nvc, nvc++ and nvfortran openmpi	5	2075	March 18, 2024
NVSHMEM Compilling GPU-Accelerated Libraries nvshmem	5	719	January 2, 2024
Nvshmem_runtime_error GPU-Accelerated Libraries nvshmem	3	245	July 7, 2024
Can i use mvapich with cuda fortran? nvc, nvc++ and nvfortran	5	476	January 17, 2024
NVSHMEM fails to compile using nvcc GPU-Accelerated Libraries hw , cuda , kernel	4	98	July 24, 2024
How to link MPI application with PGI compilers Legacy PGI Compilers	3	1315	October 29, 2021
Failure in installation of nvshmem GPU-Accelerated Libraries cuda , nvshmem	5	408	March 13, 2024
NVSHMEM launching with dynamic n_pes GPU-Accelerated Libraries nvshmem	1	291	February 5, 2024
Unable to run NVSHMEM example with slurm GPU-Accelerated Libraries nvshmem	4	468	March 31, 2024

Using NVSHMEM in Building Pytorch Operator

Related topics