Does GPU direct RDMA stage through CPU?

l.bellentani · June 19, 2024, 1:46pm

Dear staff,

I am profling an MPI-GPU application (OpenACC+MPI+CUDAFortran) configured with hpcx inside nvhpc/24.3 to analyze communications. These are implemented via aware MPI APIs, in particular MPI_Isend+MPI_Irecv+MPI_Waitall. All ranks communicate with the other ranks. In this run I am running 8 MPI ranks-gpus distributed on 2 nodes. When I look at the report with nsys-ui I noticed that the MPI_Waitall does a D2H copy (lilla events in the picture below) which I did not expect if the network supports GPU direct RDMA. I see Peer to Peer communications between ranks inside the same node, but I do not understand why the datum is staged through the CPU from within the MPI_Waitall. I guess this D2H copies are needed to move tha data to an MPI rank outside the node? Is this behaviour possible with GPU direct RDMA available? I also tried setting export UCX_IB_GPU_DIRECT_RDMA=y, but I did not notice differences.

Thank you for your help,

Laura

xiaofengl · June 24, 2024, 1:02am

The behavior related with HW topo. I think maybe your 8 rank process on GPU on 2 node lack of IB HCA communicated. ideally 1:1 GPU:HCA will fully leverage GDR(GPU DIRECT RDMA), if lack of HCA GPU will share comms, then D2H copy then regular RDMA between nodes.

But also may related with MPI jobs you run, maybe some need CPU process then D2H.

system · July 8, 2024, 1:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use MPI based P2P on 1 mpi node and 2 mpi process? CUDA Programming and Performance	0	450	May 18, 2019
Tracing data copies with CUDA-AWARE MPI Profiling Linux Targets	4	681	September 26, 2023
Is there a reference implementation for direct NIC-to-GPU data transfer? RDMA Software For GPU	1	721	September 24, 2015
CUDA-aware MPI or GPUDirect Legacy PGI Compilers	2	593	March 9, 2021
using UvA for identical code running on multiple GPUs CUDA Programming and Performance	0	655	May 30, 2013
CUDA-aware MPI - send from GPU to CPU? Legacy PGI Compilers	2	1649	December 25, 2018
GPU2FPGA transfer rate is lower than FPGA2GPU when using GPUDirect RDMA CUDA Programming and Performance	6	1267	May 27, 2022
DirectGMA support?	2	460	March 9, 2017
CUDA-aware MPI on 1 GPU transferring data to host? Legacy PGI Compilers	7	5167	October 2, 2017
An Introduction to CUDA-Aware MPI Technical Blog	5	897	August 30, 2019

Does GPU direct RDMA stage through CPU?

Related Topics