K80 peer-to-peer transfers: Slow bandwidth and high latency.

nargin · June 12, 2015, 9:12am

Hi!

I want to use P2P access between the two GPUs on my K80 card. However, they have very poor performance. According to nvidias p2pBandwidthLatencyTest, the bandwidth is below 1 GB/s. Disabled P2P and staging through the host is faster (5 GB/s). Also for latency, the p2pBandwidthLatencyTest hangs if P2P is enabled.

To my understanding, the two GPUs on the K80 are direclty connected by PCI-E 3, so the theoretical BW is 16 GB/s. Results in [1] show P2P-BW on K80 of roughly 12 GB/s. Can someone explain why I see these bad performance numbers?
[1] https://devtalk.nvidia.com/default/topic/792306/cuda-setup-and-installation/basic-question-about-2-in-1-gpus-ie-gtx-titan-z-or-k80-/

System Setup:
GPU: Tesla K80
CUDA: 6.5
Driver: 346.12

Output of p2pBandwidthLatencyTest (cancelled after 10mins):

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla K80, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla K80, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
P2P Cliques: 
[0 1]
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  84.41   5.15 
     1   5.16  80.79 
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  85.28   0.72 
     1   0.72  86.06 
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  86.35   5.31 
     1   5.38  86.22 
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1 
     0  86.71   1.30 
     1   1.30  86.85 
P2P=Disabled Latency Matrix (us)
   D\D     0      1 
     0   3.76  21.86 
     1  22.00   3.76 
P2P=Enabled Latency Matrix (us)
^C^C^C

Why do I see such bad bandwidth and large latency?

Thanks for your help,
nargin

Robert_Crovella · June 12, 2015, 2:15pm

What kind of system is the K80 installed in? HW and SW details please. What is the manufacturer and model number of the system. What linux distro are you using and which version is it.

nargin · June 12, 2015, 2:29pm

It’s a 64-bit Linux (Scientific Linux 6.6, i.e. Red Hat) part of a cluster. This node is a two-socket machine with two Intel Xeon E5-2680 v3 (Haswell?) and a C610/X99 chipset.

nargin · June 12, 2015, 2:31pm

I don’t have more details at hand (that’s what I get from the shell tools without root access). Do you need more?

Robert_Crovella · June 12, 2015, 4:07pm

If the K80 is not shipped as part of a properly qualified OEM server, it’s basically in an unsupported configuration. Other than the c series (which K80 is not in) the Tesla products are like this: they are expected to be installed in a properly qualified OEM server.

Furthermore, if scientific linux is in any way different than RHEL 6, then it’s an unsupported distro.

Having said that, I’m not sure how you ended up with 346.12. I would update the K80 driver to the latest:

currently , 346.59:

[url]http://www.nvidia.com/download/driverResults.aspx/84194/en-us[/url]

And if you are running in some non-qualified server, I would suggest making sure that the server node has the latest BIOS available installed.

nargin · June 15, 2015, 9:21am

Thanks.

zzhang · March 12, 2016, 9:46am

hi, i have encounter the same problem for titan x. so i wonder how u solve this problem finally.
Can u tell me whether you finally solved the problem? thanks

tdd11235813 · August 31, 2016, 7:39am

This helped us (using K80):

Topic		Replies	Views
cuda p2p access not working for multiple k80s CUDA Programming and Performance	0	497	July 8, 2016
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2299	October 12, 2021
K80 p2p works between onboard GPUs but not between GPUs on different cards CUDA Programming and Performance	5	920	September 28, 2018
SimpleP2P failed using Tesla K80, Windows server 2012 R2, HP DL388 CUDA Programming and Performance	7	1046	January 6, 2018
P2P between two Tesla K40c devices CUDA Setup and Installation cuda	2	616	July 14, 2020
Peer-to-peer transfer failing on GeForce GTX Titan Z CUDA Programming and Performance	17	3806	April 21, 2015
Problem with "Simple Peer-to-Peer Transfers with Multi-GPU" I got an exception when I run th CUDA Programming and Performance	1	1628	November 28, 2011
Peer access not supported between devices CUDA Programming and Performance	11	6930	November 9, 2017
p2p error in the driver make kernel unstability CUDA Setup and Installation	2	1799	September 17, 2015
Confused about GTX Titan Z Peer-To-Peer (P2) capability CUDA Programming and Performance	19	5072	February 23, 2015

K80 peer-to-peer transfers: Slow bandwidth and high latency.

Related topics