cuda p2p access not working for multiple k80s

Ishan55 · July 8, 2016, 2:37pm

Hi,

I am trying to run p2pBandwidthLatency Test in a system with 4 Tesla K80s. K40s in each K80 can peer access each other. But K40s in different K80s cannot access each other even though they are connected through PXB.
Here’s the Nvidia-smi output:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity
GPU0 X PIX PXB PXB PXB PXB PXB PXB 0-39
GPU1 PIX X PXB PXB PXB PXB PXB PXB 0-39
GPU2 PXB PXB X PIX PXB PXB PXB PXB 0-39
GPU3 PXB PXB PIX X PXB PXB PXB PXB 0-39
GPU4 PXB PXB PXB PXB X PIX PXB PXB 0-39
GPU5 PXB PXB PXB PXB PIX X PXB PXB 0-39
GPU6 PXB PXB PXB PXB PXB PXB X PIX 0-39
GPU7 PXB PXB PXB PXB PXB PXB PIX X 0-39

Legend:

X = Self
SOC = PCI path traverses a socket-level link (e.g. QPI)
PHB = PCI path traverses a host bridge
PXB = PCI path traverses multiple internal switches
PIX = PCI path traverses an internal switch
NV# = Path traverses # NVLinks

Here’s the config of my system:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION=“Ubuntu 14.04.4 LTS”
NAME=“Ubuntu”
VERSION=“14.04.4 LTS, Trusty Tahr”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 14.04.4 LTS”
VERSION_ID=“14.04”
HOME_URL=“http://www.ubuntu.com/”
SUPPORT_URL=“http://help.ubuntu.com/”
BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/”
Linux t2 3.19.0-64-generic #72~14.04.1-Ubuntu SMP Fri Jun 24 17:58:13 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

lspci output:
0000:0a:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0b:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0e:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0f:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:12:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:13:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:16:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:17:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

lspci -t:
-±[0007:00]—00.0-[01]–
±[0006:00]—00.0-[01]–
±[0005:00]—00.0-[01]–
±[0004:00]—00.0-[01]–
±[0003:00]—00.0-[01-13]----00.0-[02-13]–±01.0-[03]----00.0
| ±08.0-[04-08]–
| ±09.0-[09]–±00.0
| | ±00.1
| | ±00.2
| | -00.3
| ±10.0-[0a-0e]–
| -11.0-[0f-13]–
±[0002:00]—00.0-[01]–
±[0001:00]—00.0-[01-0d]----00.0-[02-0d]–±01.0-[03-07]–
| ±08.0-[08]----00.0
| -09.0-[09-0d]–
-[0000:00]—00.0-[01-17]----00.0-[02-17]–±04.0-[03-07]–
±08.0-[08-0b]----00.0-[09-0b]–±08.0-[0a]----00.0
| -10.0-[0b]----00.0
±0c.0-[0c-0f]----00.0-[0d-0f]–±08.0-[0e]----00.0
| -10.0-[0f]----00.0
±10.0-[10-13]----00.0-[11-13]–±08.0-[12]----00.0
| -10.0-[13]----00.0
-14.0-[14-17]----00.0-[15-17]–±08.0-[16]----00.0
-10.0-[17]----00.0

CUDA Capability major/minor version number is 3.7 and it’s same for all GPUs.

Would there be an issue in the driver? Please help in debugging it.

Topic		Replies	Views
K80 p2p works between onboard GPUs but not between GPUs on different cards CUDA Programming and Performance	5	920	September 28, 2018
K80 peer-to-peer transfers: Slow bandwidth and high latency. CUDA Programming and Performance	7	3673	August 31, 2016
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2299	October 12, 2021
CUDA peer to peer example ./simpleP2P failing CUDA Programming and Performance	11	8568	February 5, 2015
Peer access not supported between devices CUDA Programming and Performance	11	6931	November 9, 2017
P2P access hangs the system (simpleP2P doesn't work) CUDA Programming and Performance	1	1614	August 6, 2018
simpleP2P fails on 8*L40S server CUDA Programming and Performance cuda	1	537	January 22, 2024
P2P between two Tesla K40c devices CUDA Setup and Installation cuda	2	616	July 14, 2020
multi-GPU Peer to Peer access CUDA SDK example not working, why? CUDA Programming and Performance	13	5133	February 26, 2015
Multiple GPUs, Peer-to-Peer Question CUDA Setup and Installation	1	1241	October 21, 2016

cuda p2p access not working for multiple k80s

Related topics