GPUDirect RDMA performance

bo001 · January 10, 2013, 3:29pm

Hi,

I have developed a linux kernel module for a third party-device in
order to use gpudirect rdma introduced with cuda 5, as it is stated
here
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CCQQFjAB&url=http%3A%2F%2Fdeveloper.download.nvidia.com%2Fcompute%2Fcuda%2F5_0%2Frc%2Fdocs%2FGPUDirect_RDMA.pdf&ei=xCt0UI3yEYmm4gSyr4CICg&usg=AFQjCNFP9jxkg4APwF0oEs5mK42NqWAYTw
.

The system works as expected, except that I have some performance
issues. When the other device performs a write in the pci bus with
destination to the gpu, then the performance is much better in my
case. But when it requests a read to the gpu memory, then the latency
is an order of magnitude worse.

In the above document it is stated something about the pci topology
and which one yields the best performance, but it isn’t very clear to
me. The lstopo in the machine I am working is:

HostBridge L#0
…
PCIBridge
PCI 10de:06de # (GPU PCI ID)
PCI 10de:0be5
PCIBridge
PCI id of the other card
…

which means that the two cards are in the same HostBridge, but in a
different PCIBridge. In which of the three cases it belongs? Should I
do any changes in the setup? The card is the following:

3D controller: NVIDIA Corporation GF100 [Tesla S2050]

Can you help me sheding some light with respect to
the topology supported for gpudirect rdma or tell me if there is
another problem with my setup?

Thanks in advance!

bo001 · March 26, 2013, 12:38am

Ok, just another issue. Btw, I couldn’t find a solution to the previous problem. So:

From the nvidia API the nvidia_p2p_get_pages() works as expected only if size is
less than 28MB. Above that, it returns RM_ERR_INSUFFICIENT_RESOURCES
(=ENOMEM). But, the card has 2GB of memory. Why nvidia_p2p_get_pages
cannot pin more than 28MB?

Any clue about that? Thanks!

gue22 · March 26, 2013, 10:31am

Hi bo,
sorry for jumping in here w/o an answer, but I´m very much interested in the RDMA functionality.
The whole Kepler and RDMA,… could be a solution for our company. We are planning to begin shipping appliances with our software - possibly with a K20 on board.

You talk about a Tesla S2050. Don´t you need Kepler 20 (GK110) hardware for RDMA?
Do you have more details about your setup? What Linux? (October 2012 video “CUDA 5 – Everything You Need to Know” says RDMA currently only works with modified Linux kernel drivers)

You asked beginning of Jan. Don´t you get better support from Nvidia with such fancy hardware?
The RDMA docs are 0.2 July 2012! (!!!)
Thanks for any deeper insight
G.

Topic		Replies	Views
GPUDirect RDMA support with CUDA 5 CUDA Programming and Performance	19	9343	May 28, 2013
Benchmarking GPUDirect RDMA on Modern Server Platforms Technical Blog	40	3170	April 11, 2019
RDMA GPU Direct Slow CUDA Programming and Performance	10	2691	February 13, 2019
GPUDirect Performance : 25% less bandwidth than CudaMemcpy from host pinned memory Software And Drivers	6	499	February 13, 2019
P2P DMA performance limitation? where a single CPU is invoked CUDA Programming and Performance	3	1693	November 27, 2017
Trying to get GPUdirect RDMA working. CUDA Setup and Installation	2	1665	April 10, 2014
Will GPUDirect RDMA works for cuda 6.0 and above? CUDA Programming and Performance	0	490	December 29, 2015
Exploring GPUDirect on a Local Area network Teaching & Curriculum Support	1	1201	September 22, 2013
GPUDirect RDMA Single PCI-e writes CUDA Programming and Performance	2	609	October 23, 2018
RDMA GPUDirect//nvidia-peer-memory/cuda issue RDMA Software For GPU software-and-drivers , howto-enable-verify-and-troubleshoo	11	2384	September 12, 2019

GPUDirect RDMA performance

Related topics