Multi-GPU Peer to Peer access failing on Tesla K80

Thanks txbob. We are renting bare metal. They told me the BIOS is up-to-date but I have my doubts. Do you know by any chance if or what Supermicro BIOS should provide a fix? I’m trying to get the DC provider to update to BIOS to the bleeding edge version.

Since you haven’t identified what actual supermicro system you are referring to, I can’t say anything about BIOS versions.

Find out the model number of your system. Find out the BIOS version currently installed on your system. Google on the supermicro site for that model number and the word “BIOS”. You will then be able to see what is the latest bios version.

If your system is not at the latest bios version, get it updated. If that does not fix the issue, then contact supermicro with that model number in hand, and describe your issue to them.

I faced a similar issue here (Testing nccl with a difficult topology · Issue #19 · NVIDIA/nccl · GitHub) with a supermicro pc 7048GR-TR. It turns out that by disabling the ACS the execution of the simpleP2P runs smoothly, namely:

[r1bsl@supermicro simpleP2P]$ ./simpleP2P
[./simpleP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 6

GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU4 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU5 = " Tesla K80" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : Yes
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU0) : Yes
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU4) : Yes
Enabling peer access between GPU0 and GPU1…
Checking GPU0 and GPU1 for UVA capabilities…
Tesla K80 (GPU0) supports UVA: Yes
Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling…
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)…
Creating event handles…
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 7.42GB/s
Preparing host buffer and memcpy to GPU0…
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1…
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0…
Copy data back to host from GPU0 and verify results…
Disabling peer access…
Shutting down…
Test passed

Hi

I am facing a similar kind of problem. Came across this actually while training a deep network using caffe.
The machine has 4 K80 GPUs and I am not able to utilize the power of those due to failed P2P access between them.

A bit of diagnosis from my end:

On running nvidia-smi topo -m

    GPU0	    GPU1	    GPU2	 GPU3    CPU Affinity

GPU0 X PHB PHB PHB 0-15
GPU1 PHB X PHB PHB 0-15
GPU2 PHB PHB X PHB 0-15
GPU3 PHB PHB PHB X 0-15

This looks OK to me as all the GPUs have access to each other through a PCIe host bridge.

But while running simpleP2P test in cuda samples, this is what I get:

Checking for multiple GPUs…
CUDA-capable device count: 4

GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU0) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU2) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.

Could someone please help me debug this?

What sort of system are the K80s installed in? Did you purchase the system from an OEM that has qualified the system for use with K80s ?

Are you using a supported CUDA configuration? (e.g. OS)

If so, you should contact the OEM to arrange for technical support

Hi, I am having a similar problem!

I have two GPUs in the same PIX, but without P2P-enabled.

nvidia-smi

Thu Nov 24 06:14:19 2016
±-----------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:03:00.0 Off | 0 |
| N/A 38C P0 57W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 On | 0000:04:00.0 Off | 0 |
| N/A 24C P8 29W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K80 On | 0002:03:00.0 Off | 0 |
| N/A 34C P8 28W / 149W | 22MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K80 On | 0002:04:00.0 Off | 0 |
| N/A 34C P0 70W / 149W | 867MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |

nvidia-smi topo -m

GPU0	GPU1	GPU2	GPU3	CPU Affinity

GPU0 X PIX SOC SOC 0-79
GPU1 PIX X SOC SOC 0-79
GPU2 SOC SOC X PIX 80-159
GPU3 SOC SOC PIX X 80-159

Legend:

X = Self
SOC = Path traverses a socket-level link (e.g. QPI)
PHB = Path traverses a PCIe host bridge
PXB = Path traverses multiple PCIe internal switches
PIX = Path traverses a PCIe internal switch

./testAllP2P

[./testAllP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 4

Access from Tesla K80 (GPU0) → Tesla K80 (GPU1)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU1) : No

Access from Tesla K80 (GPU0) → Tesla K80 (GPU2)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU2) : No

Access from Tesla K80 (GPU0) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU0) → Tesla K80 (GPU3) : No

Access from Tesla K80 (GPU1) → Tesla K80 (GPU2)
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : No

Access from Tesla K80 (GPU1) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No

Access from Tesla K80 (GPU2) → Tesla K80 (GPU3)
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : No

uname -a

Linux 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:31:44 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

lscpu

lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0-159
Thread(s) per core: 8
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Model: 8335-GTA
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159

lspci | grep -i plx

0000:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0000:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0000:02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0002:02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:01:00.1 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.2 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.3 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:01:00.4 System peripheral: PLX Technology, Inc. Device 87d0 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0a.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0b.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)
0003:02:0c.0 PCI bridge: PLX Technology, Inc. Device 8725 (rev ca)

lspci -s 0000:02:08.0 -vvvv | grep -i acs

	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP- SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
	ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

lspci -s 0000:02:10.0 -vvvv | grep -i acs

	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP- SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
	ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

All the other pci devices are with the same configuration

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:31:50_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Please, could you help me with any clue?