Hi!
I want to use P2P access between the two GPUs on my K80 card. However, they have very poor performance. According to nvidias p2pBandwidthLatencyTest, the bandwidth is below 1 GB/s. Disabled P2P and staging through the host is faster (5 GB/s). Also for latency, the p2pBandwidthLatencyTest hangs if P2P is enabled.
To my understanding, the two GPUs on the K80 are direclty connected by PCI-E 3, so the theoretical BW is 16 GB/s. Results in [1] show P2P-BW on K80 of roughly 12 GB/s. Can someone explain why I see these bad performance numbers?
[1] https://devtalk.nvidia.com/default/topic/792306/cuda-setup-and-installation/basic-question-about-2-in-1-gpus-ie-gtx-titan-z-or-k80-/
System Setup:
GPU: Tesla K80
CUDA: 6.5
Driver: 346.12
Output of p2pBandwidthLatencyTest (cancelled after 10mins):
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla K80, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla K80, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
P2P Cliques:
[0 1]
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 84.41 5.15
1 5.16 80.79
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 85.28 0.72
1 0.72 86.06
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 86.35 5.31
1 5.38 86.22
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 86.71 1.30
1 1.30 86.85
P2P=Disabled Latency Matrix (us)
D\D 0 1
0 3.76 21.86
1 22.00 3.76
P2P=Enabled Latency Matrix (us)
^C^C^C
Why do I see such bad bandwidth and large latency?
Thanks for your help,
nargin