Hi,
I have two K420s that I recently replaced with two P600s, but it appears that P2P is not working for the P600s.
However, it does work for K420s.
I was under the impression that P2P is supposed to work for identical cards, even GeForce cards. Has this policy changed?
Here is the output from simpleP2P from the NVIDIA samples:
[root@metty simpleP2P]# ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 3
> GPU0 = "GeForce GTX 1050" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Quadro P600" IS capable of Peer-to-Peer (P2P)
> GPU2 = " Quadro P600" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer access from GeForce GTX 1050 (GPU0) -> Quadro P600 (GPU1) : No
> Peer access from GeForce GTX 1050 (GPU0) -> Quadro P600 (GPU2) : No
> Peer access from Quadro P600 (GPU1) -> GeForce GTX 1050 (GPU0) : No
> Peer access from Quadro P600 (GPU1) -> Quadro P600 (GPU2) : No
> Peer access from Quadro P600 (GPU2) -> GeForce GTX 1050 (GPU0) : No
> Peer access from Quadro P600 (GPU2) -> Quadro P600 (GPU1) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.
And some nvidia-smi output:
[root@metty simpleP2P]# nvidia-smi
Tue Apr 3 13:57:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:05:00.0 Off | N/A |
| 35% 40C P0 N/A / 75W | 0MiB / 1999MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro P600 Off | 00000000:0B:00.0 Off | N/A |
| 36% 50C P0 N/A / N/A | 0MiB / 2000MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Quadro P600 Off | 00000000:0C:00.0 Off | N/A |
| 0% 67C P0 N/A / N/A | 0MiB / 2000MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[root@metty simpleP2P]# nvidia-smi topo -m
GPU0 GPU1 GPU2 CPU Affinity
GPU0 X PHB PHB 0-5
GPU1 PHB X PIX 0-5
GPU2 PHB PIX X 0-5
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
[root@metty simpleP2P]# nvidia-smi topo -p2p w
GPU0 GPU1 GPU2
GPU0 X GNS GNS
GPU1 GNS X GNS
GPU2 GNS GNS X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
For the K420s, P2P works perfectly:
[root@metty p2pBandwidthLatencyTest]# nvidia-smi -L
GPU 0: GeForce GTX 1050 (UUID: GPU-578cae79-a799-351b-1b29-157171e6af0d)
GPU 1: Quadro K420 (UUID: GPU-30178a26-07b7-42a4-03bd-cf08253d89ae)
GPU 2: Quadro K420 (UUID: GPU-f81abec5-ef46-4ff7-4216-2d1786323335)
[root@metty p2pBandwidthLatencyTest]# nvidia-smi topo -m
GPU0 GPU1 GPU2 CPU Affinity
GPU0 X PHB PHB 0-5
GPU1 PHB X PIX 0-5
GPU2 PHB PIX X 0-5
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
[root@metty p2pBandwidthLatencyTest]# nvidia-smi topo -p2p rw
GPU0 GPU1 GPU2
GPU0 X NS NS
GPU1 NS X OK
GPU2 NS OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
[root@metty p2pBandwidthLatencyTest]# ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, GeForce GTX 1050, pciBusID: 5, pciDeviceID: 0, pciDomainID:0
Device: 1, Quadro K420, pciBusID: b, pciDeviceID: 0, pciDomainID:0
Device: 2, Quadro K420, pciBusID: c, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=0 CANNOT Access Peer Device=2
Device=1 CANNOT Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=2 CANNOT Access Peer Device=0
Device=2 CAN Access Peer Device=1
...
I’m using Linux kernel version 4.15, Nvidia driver 390.30 and CUDA 9.1, if that is of any interest.
EDIT: Just for curiosity, I tried with two K420s and one P600.
[root@metty simpleP2P]# ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 3
> GPU0 = " Quadro P600" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Quadro K420" IS capable of Peer-to-Peer (P2P)
> GPU2 = " Quadro K420" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer access from Quadro P600 (GPU0) -> Quadro K420 (GPU1) : No
> Peer access from Quadro P600 (GPU0) -> Quadro K420 (GPU2) : No
> Peer access from Quadro K420 (GPU1) -> Quadro P600 (GPU0) : No
> Peer access from Quadro K420 (GPU1) -> Quadro K420 (GPU2) : Yes
> Peer access from Quadro K420 (GPU2) -> Quadro P600 (GPU0) : No
> Peer access from Quadro K420 (GPU2) -> Quadro K420 (GPU1) : Yes
Enabling peer access between GPU1 and GPU2...
Checking GPU1 and GPU2 for UVA capabilities...
> Quadro K420 (GPU1) supports UVA: Yes
> Quadro K420 (GPU2) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU1, GPU2 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU1 and GPU2: 5.64GB/s
Preparing host buffer and memcpy to GPU1...
Run kernel on GPU2, taking source data from GPU1 and writing to GPU2...
Run kernel on GPU1, taking source data from GPU2 and writing to GPU1...
Copy data back to host from GPU1 and verify results...
Disabling peer access...
Shutting down...
Test passed