GPUDirect RDMA performance

Hi,

I have developed a linux kernel module for a third party-device in
order to use gpudirect rdma introduced with cuda 5, as it is stated
here
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CCQQFjAB&url=http%3A%2F%2Fdeveloper.download.nvidia.com%2Fcompute%2Fcuda%2F5_0%2Frc%2Fdocs%2FGPUDirect_RDMA.pdf&ei=xCt0UI3yEYmm4gSyr4CICg&usg=AFQjCNFP9jxkg4APwF0oEs5mK42NqWAYTw
.

The system works as expected, except that I have some performance
issues. When the other device performs a write in the pci bus with
destination to the gpu, then the performance is much better in my
case. But when it requests a read to the gpu memory, then the latency
is an order of magnitude worse.

In the above document it is stated something about the pci topology
and which one yields the best performance, but it isn’t very clear to
me. The lstopo in the machine I am working is:

HostBridge L#0

PCIBridge
PCI 10de:06de # (GPU PCI ID)
PCI 10de:0be5
PCIBridge
PCI id of the other card

which means that the two cards are in the same HostBridge, but in a
different PCIBridge. In which of the three cases it belongs? Should I
do any changes in the setup? The card is the following:

3D controller: NVIDIA Corporation GF100 [Tesla S2050]

Can you help me sheding some light with respect to
the topology supported for gpudirect rdma or tell me if there is
another problem with my setup?

Thanks in advance!

Ok, just another issue. Btw, I couldn’t find a solution to the previous problem. So:

From the nvidia API the nvidia_p2p_get_pages() works as expected only if size is
less than 28MB. Above that, it returns RM_ERR_INSUFFICIENT_RESOURCES
(=ENOMEM). But, the card has 2GB of memory. Why nvidia_p2p_get_pages
cannot pin more than 28MB?

Any clue about that? Thanks!

Hi bo,
sorry for jumping in here w/o an answer, but I´m very much interested in the RDMA functionality.
The whole Kepler and RDMA,… could be a solution for our company. We are planning to begin shipping appliances with our software - possibly with a K20 on board.

You talk about a Tesla S2050. Don´t you need Kepler 20 (GK110) hardware for RDMA?
Do you have more details about your setup? What Linux? (October 2012 video “CUDA 5 – Everything You Need to Know” says RDMA currently only works with modified Linux kernel drivers)

You asked beginning of Jan. Don´t you get better support from Nvidia with such fancy hardware?
The RDMA docs are 0.2 July 2012! (!!!)
Thanks for any deeper insight
G.