K80 peer-to-peer transfers: Slow bandwidth and high latency.

Hi!

I want to use P2P access between the two GPUs on my K80 card. However, they have very poor performance. According to nvidias p2pBandwidthLatencyTest, the bandwidth is below 1 GB/s. Disabled P2P and staging through the host is faster (5 GB/s). Also for latency, the p2pBandwidthLatencyTest hangs if P2P is enabled.

To my understanding, the two GPUs on the K80 are direclty connected by PCI-E 3, so the theoretical BW is 16 GB/s. Results in [1] show P2P-BW on K80 of roughly 12 GB/s. Can someone explain why I see these bad performance numbers?
[1] https://devtalk.nvidia.com/default/topic/792306/cuda-setup-and-installation/basic-question-about-2-in-1-gpus-ie-gtx-titan-z-or-k80-/

System Setup:
GPU: Tesla K80
CUDA: 6.5
Driver: 346.12

Output of p2pBandwidthLatencyTest (cancelled after 10mins):

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla K80, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla K80, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
P2P Cliques: 
[0 1]
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  84.41   5.15 
     1   5.16  80.79 
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  85.28   0.72 
     1   0.72  86.06 
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  86.35   5.31 
     1   5.38  86.22 
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  86.71   1.30 
     1   1.30  86.85 
P2P=Disabled Latency Matrix (us)
   D\D     0      1 
     0   3.76  21.86 
     1  22.00   3.76 
P2P=Enabled Latency Matrix (us)
^C^C^C

Why do I see such bad bandwidth and large latency?

Thanks for your help,
nargin

What kind of system is the K80 installed in? HW and SW details please. What is the manufacturer and model number of the system. What linux distro are you using and which version is it.

It’s a 64-bit Linux (Scientific Linux 6.6, i.e. Red Hat) part of a cluster. This node is a two-socket machine with two Intel Xeon E5-2680 v3 (Haswell?) and a C610/X99 chipset.

I don’t have more details at hand (that’s what I get from the shell tools without root access). Do you need more?

If the K80 is not shipped as part of a properly qualified OEM server, it’s basically in an unsupported configuration. Other than the c series (which K80 is not in) the Tesla products are like this: they are expected to be installed in a properly qualified OEM server.

Furthermore, if scientific linux is in any way different than RHEL 6, then it’s an unsupported distro.

Having said that, I’m not sure how you ended up with 346.12. I would update the K80 driver to the latest:

currently , 346.59:

http://www.nvidia.com/download/driverResults.aspx/84194/en-us

And if you are running in some non-qualified server, I would suggest making sure that the server node has the latest BIOS available installed.

Thanks.

hi, i have encounter the same problem for titan x. so i wonder how u solve this problem finally.
Can u tell me whether you finally solved the problem? thanks

This helped us (using K80):
http://www.supermicro.com/support/faqs/faq.cfm?faq=20732