Peer-to-Peer Access Fails between 2 GPUs

Peita · June 30, 2017, 8:59pm

I have installed 2 P-100s in my machine but they could not access memory directly.

My GPUs are in the same CPU sockets

nvidia-smi topo -m
GPU0 GPU1 CPU Affinity
GPU0 X SOC 0-7,16-23
GPU1 SOC X 8-15,24-31

Legend:

X = Self
SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks

But no Peer access each other

/usr/local/cuda/samples/0_Simple/simpleP2P/simpleP2P
[/usr/local/cuda/samples/0_Simple/simpleP2P/simpleP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 2

GPU0 = “Tesla P100-PCIE-16GB” IS capable of Peer-to-Peer (P2P)
GPU1 = “Tesla P100-PCIE-16GB” IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla P100-PCIE-16GB (GPU0) → Tesla P100-PCIE-16GB (GPU1) : No
Peer access from Tesla P100-PCIE-16GB (GPU1) → Tesla P100-PCIE-16GB (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for /usr/local/cuda/samples/0_Simple/simpleP2P/simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.

Robert_Crovella · June 30, 2017, 9:12pm

When you see SOC in the topology matrix, it means those 2 GPUs share a socket-level link, like QPI.

You cannot run a P2P connection over QPI. So the problem is not in the GPUs or the software, but in the system you have them plugged into. One GPU is connected to one CPU socket, and the other GPU is connected to the other CPU socket, and that will not work for P2P.

You cannot fix this except by fixing the system.

Peita · July 3, 2017, 8:17pm

Thanks for your post txbob, I was told that my GPUs were installed in the same GPUs according to my machine maker but they were not. I will check them out again.

Peita · July 7, 2017, 3:34pm

I solved the problem. My server box Supermicro 1028GR-TR and its manual MNL-1625.pdf had a wrong CPU socket description on Figure 6-5 on the manual page 6-5 (PDF page: 77). The figure shows that GPU Slot 1 and 2 are connected to CPU 1 but this is wrong. Actually, GPU Slot 1 and 4 are connected to CPU 2 and GPU Slot 2 is connected to CPU 1. The manufacture told me about this information. I initially installed my GPUs into Slot 1 and 2 as the manual description but GPUs did not get P2P access. I moved one of GPUs to Slot 4 from Slot 2. Now, my GPUs installed on Slot 1 and 4. They can have P2P access now.

I reported to the manufacture about the problem on their manual. However, I like to post this here in case of that someone has the same problem as I did.

My P2P works now!

nvidia-smi topo -m
GPU0 GPU1 CPU Affinity
GPU0 X PHB 8-15,24-31
GPU1 PHB X 8-15,24-31

Legend:

X = Self
SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks

[/usr/local/cuda/samples/0_Simple/simpleP2P/simpleP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 2

GPU0 = “Tesla P100-PCIE-16GB” IS capable of Peer-to-Peer (P2P)
GPU1 = “Tesla P100-PCIE-16GB” IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla P100-PCIE-16GB (GPU0) → Tesla P100-PCIE-16GB (GPU1) : Yes
Peer access from Tesla P100-PCIE-16GB (GPU1) → Tesla P100-PCIE-16GB (GPU0) : Yes
Enabling peer access between GPU0 and GPU1…
Checking GPU0 and GPU1 for UVA capabilities…
Tesla P100-PCIE-16GB (GPU0) supports UVA: Yes
Tesla P100-PCIE-16GB (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling…
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)…
Creating event handles…
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 9.30GB/s
Preparing host buffer and memcpy to GPU0…
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1…
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0…
Copy data back to host from GPU0 and verify results…
Disabling peer access…
Shutting down…
Test passed

Topic		Replies	Views
Peer-to-Peer Communication CUDA Programming and Performance	1	2428	October 13, 2014
Peer-to-Peer not enabled on Dell R730 but is on Dell R740 CUDA Programming and Performance	1	988	May 14, 2018
CUDA peer to peer example ./simpleP2P failing CUDA Programming and Performance	11	8523	February 5, 2015
Problem with "Simple Peer-to-Peer Transfers with Multi-GPU" I got an exception when I run th CUDA Programming and Performance	1	1622	November 28, 2011
P2P Transfers Across Single PCIe Switch Fail CUDA Programming and Performance	5	1242	April 15, 2024
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2232	October 12, 2021
Multiple GPUs, Peer-to-Peer Question CUDA Setup and Installation	1	1232	October 21, 2016
Peer access not supported between devices CUDA Programming and Performance	11	6764	November 9, 2017
Confused about GTX Titan Z Peer-To-Peer (P2) capability CUDA Programming and Performance	19	5050	February 23, 2015
P2P between two Tesla K40c devices CUDA Setup and Installation cuda	2	605	July 14, 2020

Peer-to-Peer Access Fails between 2 GPUs

Related topics