An Introduction to CUDA-Aware MPI

jwitsoe · August 15, 2013, 7:09am

Originally published at: https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build applications that can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which is designed for parallel computing on a single computer or node. There are many reasons…

anon78487409 · October 25, 2016, 10:34am

Hello,

I have a question about the MPI communication, I didn't understand if it can be used to transfer data between gpu's in the same machine.

Thank you in advance.

anon27184784 · February 20, 2017, 4:16am

Hi Miguel,

Yes. MPI can be used for communication between GPUs, both within a node and across nodes. MPI supports Intra-node (within the node) and Inter-node (across cluster nodes) communication. MVAPICH2 is a CUDA-Aware MPI library (mvapich.cse.ohio-state.edu) that you can use to perform communication for GPUs in the same machine as well across machines.

anon48074180 · May 27, 2017, 9:07pm

Hi,
Can it be used to split GPU computing power for smaller pieces for example to rent someone?

anon52286640 · December 18, 2017, 2:47pm

how about mpi4py in python is that pycuda aware?

anon48481378 · August 30, 2019, 8:20pm

Hi, Jiri,
A new question. With CUDA aware MPI, in MPI_Send, if the send buffer is a device pointer and the data is produced by a previous compute kernel (might be on a non-default stream, might be results of a cuBLAS call, or in other words, as a library developer, I don't know where the data is from), do I need to call cudaDeviceSynchronize() before the send?
After MPI_Recv(), if the recv buffer is a device pointer, can I access it immediately in a kernel on a new stream?
Thanks.

Topic		Replies	Views
Benchmarking CUDA-Aware MPI Technical Blog	16	1322	August 20, 2019
MPI and CUDA mixed programming General CUDA programming CUDA Programming and Performance	22	23610	July 27, 2010
MPI + Peer2Peer combine MPI and Peer2Peer CUDA Programming and Performance	5	1807	February 8, 2012
CUDA Cluster Programming Any1 Experienced? CUDA Programming and Performance	12	7041	December 5, 2008
CUDA 4.0 CUDA Programming and Performance	63	507394	March 28, 2013
using all 4 GPUs in S1070 from multi-core cpu? how CUDA Programming and Performance	11	32413	December 13, 2010
Benchmarking GPUDirect RDMA on Modern Server Platforms Technical Blog	40	2655	April 11, 2019
Unusually slow MPI communication between GPUs nvc, nvc++ and nvfortran	1	470	September 5, 2023
MPI send + OpenACC + acc_malloc fail with NVFortran, but work with C nvc, nvc++ and nvfortran	10	64	September 6, 2024
CUDA multicore/mpi CUDA Programming and Performance	11	3544	September 3, 2008

An Introduction to CUDA-Aware MPI

Related topics