OpenACC directives to transfer data between GPUs

rob_v8 · March 30, 2021, 2:34pm

Hi,

is it possible to transfer data from one GPU to another (over NVLink) by using OpenACC directives?

Regards,
Rob

MatColgrove · March 30, 2021, 3:30pm

Hi Rob,

No, OpenACC is agnostic to the type of device it targets and direct data transfers between devices would be a specific feature of a particular device.

For NVIDIA devices, you’d want to pass device pointers to an API that supports direct data transfers using the OpenACC “host_data use_device” construct. For example, a CUDA Aware MPI routine, NVShmem, or calling the low level GPUDirect RDMA APIs.

-Mat

rob_v8 · May 7, 2021, 12:39pm

Hi Mat,

thanks for the answer. I have a follow-up question: How does it work with unified memory, it should be accessible from multiple GPUs right? How would data exchange work with unified memory between several GPUs on the same NVLINK? Can it take advantage of the direct links?

Regards,
Rob

MatColgrove · May 7, 2021, 4:31pm

Yes, UM does work for multiple devices, but I don’t know the specifics on how the transfers are done. Though I found the following documentation: Programming Guide :: CUDA Toolkit Documentation

M.1.5. Multi-GPU

For devices of compute capability lower than 6.x managed memory allocation behaves identically to unmanaged memory allocated using cudaMalloc(): the current active device is the home for the physical allocation, and all other GPUs receive peer mappings to the memory. This means that other GPUs in the system will access the memory at reduced bandwidth over the PCIe bus. Note that if peer mappings are not supported between the GPUs in the system, then the managed memory pages are placed in CPU system memory (“zero-copy” memory), and all GPUs will experience PCIe bandwidth restrictions. See Managed Memory with Multi-GPU Programs on pre-6.x Architectures for details.

Managed allocations on systems with devices of compute capability 6.x are visible to all GPUs and can migrate to any processor on-demand. Unified Memory performance hints (see Performance Tuning) allow developers to explore custom usage patterns, such as read duplication of data across GPUs and direct access to peer GPU memory without migration.

Topic		Replies	Views
Can Unified Memory Migration use NVLink? CUDA Programming and Performance	2	748	October 12, 2021
Direct GPU-to-GPU data transfer with OpenACC+managed+MPI nvc, nvc++ and nvfortran	4	1116	April 12, 2022
multiple gpu and unified memory CUDA Programming and Performance	3	4595	March 29, 2022
Multi-GPU Unified Memory and Communication nvc, nvc++ and nvfortran	4	758	October 27, 2023
Unified memory - more than 1 GPU Legacy PGI Compilers	5	2701	January 17, 2019
Hardware coherence over NVLink CUDA Programming and Performance	3	3332	May 1, 2023
OpenACC / CUDA Legacy PGI Compilers	1	2508	June 24, 2013
Unified memory with multiple GPUs and no P2P CUDA Programming and Performance cuda	5	156	January 9, 2025
how to best transfer memory between GPUs sitting on different PCI controllers CUDA Programming and Performance	0	1859	February 20, 2012
Multiple GPU's and sharing memory Will a CUDA API eventually be provided for this? CUDA Programming and Performance	4	16499	June 28, 2010

OpenACC directives to transfer data between GPUs

M.1.5. Multi-GPU

Related topics