Hi,
I have developed a linux kernel module for a third party-device in
order to use gpudirect rdma introduced with cuda 5, as it is stated
here
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CCQQFjAB&url=http%3A%2F%2Fdeveloper.download.nvidia.com%2Fcompute%2Fcuda%2F5_0%2Frc%2Fdocs%2FGPUDirect_RDMA.pdf&ei=xCt0UI3yEYmm4gSyr4CICg&usg=AFQjCNFP9jxkg4APwF0oEs5mK42NqWAYTw
.
The system works as expected, except that I have some performance
issues. When the other device performs a write in the pci bus with
destination to the gpu, then the performance is much better in my
case. But when it requests a read to the gpu memory, then the latency
is an order of magnitude worse.
In the above document it is stated something about the pci topology
and which one yields the best performance, but it isn’t very clear to
me. The lstopo in the machine I am working is:
HostBridge L#0
…
PCIBridge
PCI 10de:06de # (GPU PCI ID)
PCI 10de:0be5
PCIBridge
PCI id of the other card
…
which means that the two cards are in the same HostBridge, but in a
different PCIBridge. In which of the three cases it belongs? Should I
do any changes in the setup? The card is the following:
3D controller: NVIDIA Corporation GF100 [Tesla S2050]
Can you help me sheding some light with respect to
the topology supported for gpudirect rdma or tell me if there is
another problem with my setup?
Thanks in advance!