OpenACC directives to transfer data between GPUs

Hi,

is it possible to transfer data from one GPU to another (over NVLink) by using OpenACC directives?

Regards,
Rob

Hi Rob,

No, OpenACC is agnostic to the type of device it targets and direct data transfers between devices would be a specific feature of a particular device.

For NVIDIA devices, you’d want to pass device pointers to an API that supports direct data transfers using the OpenACC “host_data use_device” construct. For example, a CUDA Aware MPI routine, NVShmem, or calling the low level GPUDirect RDMA APIs.

-Mat

Hi Mat,

thanks for the answer. I have a follow-up question: How does it work with unified memory, it should be accessible from multiple GPUs right? How would data exchange work with unified memory between several GPUs on the same NVLINK? Can it take advantage of the direct links?

Regards,
Rob

Yes, UM does work for multiple devices, but I don’t know the specifics on how the transfers are done. Though I found the following documentation: Programming Guide :: CUDA Toolkit Documentation

M.1.5. Multi-GPU

For devices of compute capability lower than 6.x managed memory allocation behaves identically to unmanaged memory allocated using cudaMalloc(): the current active device is the home for the physical allocation, and all other GPUs receive peer mappings to the memory. This means that other GPUs in the system will access the memory at reduced bandwidth over the PCIe bus. Note that if peer mappings are not supported between the GPUs in the system, then the managed memory pages are placed in CPU system memory (“zero-copy” memory), and all GPUs will experience PCIe bandwidth restrictions. See Managed Memory with Multi-GPU Programs on pre-6.x Architectures for details.

Managed allocations on systems with devices of compute capability 6.x are visible to all GPUs and can migrate to any processor on-demand. Unified Memory performance hints (see Performance Tuning) allow developers to explore custom usage patterns, such as read duplication of data across GPUs and direct access to peer GPU memory without migration.