GPUDirect RDMA Single PCI-e writes

rowanphilip · October 12, 2018, 10:12am

Hi, for context my aim is to rapidly transfer small chunks of data (< 8 bytes at a time) to the GPU with minimal latency from a third party device. Based upon my current understanding, GPUDirect RDMA allows for the pinning of a region of GPU memory and for its physical address to be accessed through the API. First of all, is this understanding correct? Secondly, I would like to have the third party device send PCI-e writes to the GPU physical address regularly and without any CPU interaction after the initial setup (so as to avoid latency). There would be a persistent kernel constantly polling that region of memory and processing the data when it arrives. Is this possible or does anyone see potential problems with this approach?

njuffa · October 12, 2018, 4:41pm

Tiny transfers across PCIe result in pretty low throughput, in the single-digit MB/s. This fairly recent VMware blog entry shows some data that looks very plausible to me (figures 4 and 8).

[url]https://blogs.vmware.com/apps/2018/06/scaling-hpc-and-ml-with-gpudirect-rdma-on-vsphere-6-7-part-2-of-2.html[/url]

rowanphilip · October 23, 2018, 9:42am

Thanks for the response. A low throughput, even in the order of MB/s, is not a massive concern for this application as the GPU will be performing parallel computations on the same small chunk of input data. What is more of a concern is getting the absolute lowest latency possible. This is where initiating a full DMA transfer by the CPU for each chunk of data takes too long. Instead, I believe that a direct PCI-e write from a third party device to the GPU directly every few microseconds is a better fit for this problem if it is indeed possible.

Topic		Replies	Views
CPU to GPU PCIe transfer using GPUDirect P2P CUDA Programming and Performance	2	1370	September 3, 2014
Direct access to the GPU memory over PCIe? CUDA Programming and Performance	5	4399	October 23, 2018
(gpu && ssd) CUDA Programming and Performance	5	1684	January 19, 2015
Questions on GPUs for software-defined radios CUDA Programming and Performance	2	3017	February 23, 2016
RDMA GPU Direct Slow CUDA Programming and Performance	10	2349	February 13, 2019
GPUDirect RDMA PCIe Topology CUDA Programming and Performance pcie	3	1076	November 6, 2021
Problem porting a GPUDirect [rdma?] solution between topologies RDMA Software For GPU	0	6	February 25, 2025
PCI-e Device to Device Transfers CUDA Programming and Performance	4	7475	September 22, 2010
GPU and Ethernet communication CUDA Programming and Performance	4	2997	October 11, 2023
GPUdirect RDMA with NVIDIA A100 for PCIe DGX User Forum cuda , a100 , rdma-and-roce	1	2245	June 17, 2022

GPUDirect RDMA Single PCI-e writes

Related topics