Peer-to-Peer Communication

Hi,

Quoting from Professional CUDA C Programming : “Kernels executing in 64-bit applications on devices with compute capability 2.0 and higher can directly access the global memory of any GPU connected to the same PCIE root node.[…] requires CUDA 4.0 or higher […] a system with two or more Fermi or Kepler GPUs […]”.

I am working on a machine with two Tesla C2075 (fermi) on a X9DR3-F motherboard with two Xeon CPUs. I’ve installed Ubuntu 12.04 and CUDA 6.5. Here is the output of the simpleP2P sample program :

[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "    Tesla C2075" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "    Tesla C2075" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer-to-Peer (P2P) access from Tesla C2075 (GPU0) -> Tesla C2075 (GPU1) : No
> Peer-to-Peer (P2P) access from Tesla C2075 (GPU1) -> Tesla C2075 (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.

Is there any reason that I cannot use peer to peer memory access ? I will also attach the results of lspci -tv :

[...]
 +-[0000:80]-+-01.0-[81]--
 |           +-02.0-[82]--
 |           +-03.0-[83]----00.0  NVIDIA Corporation GF110GL [Tesla C2050 / C2075]
[...]
\-[0000:00]-+-00.0  Intel Corporation Ivytown DMI2
             +-01.0-[01]--
             +-01.1-[02-03]--+-00.0  Intel Corporation I350 Gigabit Network Connection
             |               \-00.1  Intel Corporation I350 Gigabit Network Connection
             +-02.0-[04]----00.0  NVIDIA Corporation GF110GL [Tesla C2050 / C2075]

If the problem has to do with the hardware, it means that the motherboard should be changed ?

Best regards

Ok, so if someone ever needs this information :

I contacted the system administrator to have a look at the physical connections of the GPUs. This is the motherboard diagram :

External Media

Turns out the first Tesla was in Slot 2 and second Tesla in slot 4. In order to make P2P work, you have to connect both cards to same CPU and avoid QPI links, so here the card in slot 2 was moved to slot 6. It’s weird because slot 4 and slot 6 are the two first slots of the motherboard, but the tech prefered to plug the cards at the bottom rather than at the top of the MB.

Hope it can help someone.