cuda p2p access not working for multiple k80s

Hi,

I am trying to run p2pBandwidthLatency Test in a system with 4 Tesla K80s. K40s in each K80 can peer access each other. But K40s in different K80s cannot access each other even though they are connected through PXB.
Here’s the Nvidia-smi output:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity
GPU0 X PIX PXB PXB PXB PXB PXB PXB 0-39
GPU1 PIX X PXB PXB PXB PXB PXB PXB 0-39
GPU2 PXB PXB X PIX PXB PXB PXB PXB 0-39
GPU3 PXB PXB PIX X PXB PXB PXB PXB 0-39
GPU4 PXB PXB PXB PXB X PIX PXB PXB 0-39
GPU5 PXB PXB PXB PXB PIX X PXB PXB 0-39
GPU6 PXB PXB PXB PXB PXB PXB X PIX 0-39
GPU7 PXB PXB PXB PXB PXB PXB PIX X 0-39

Legend:

X = Self
SOC = PCI path traverses a socket-level link (e.g. QPI)
PHB = PCI path traverses a host bridge
PXB = PCI path traverses multiple internal switches
PIX = PCI path traverses an internal switch
NV# = Path traverses # NVLinks

Here’s the config of my system:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION=“Ubuntu 14.04.4 LTS”
NAME=“Ubuntu”
VERSION=“14.04.4 LTS, Trusty Tahr”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 14.04.4 LTS”
VERSION_ID=“14.04”
HOME_URL=“http://www.ubuntu.com/
SUPPORT_URL=“http://help.ubuntu.com/
BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/
Linux t2 3.19.0-64-generic #72~14.04.1-Ubuntu SMP Fri Jun 24 17:58:13 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

lspci output:
0000:0a:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0b:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0e:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:0f:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:12:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:13:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:16:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:17:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

lspci -t:
-±[0007:00]—00.0-[01]–
±[0006:00]—00.0-[01]–
±[0005:00]—00.0-[01]–
±[0004:00]—00.0-[01]–
±[0003:00]—00.0-[01-13]----00.0-[02-13]–±01.0-[03]----00.0
| ±08.0-[04-08]–
| ±09.0-[09]–±00.0
| | ±00.1
| | ±00.2
| | -00.3
| ±10.0-[0a-0e]–
| -11.0-[0f-13]–
±[0002:00]—00.0-[01]–
±[0001:00]—00.0-[01-0d]----00.0-[02-0d]–±01.0-[03-07]–
| ±08.0-[08]----00.0
| -09.0-[09-0d]–
-[0000:00]—00.0-[01-17]----00.0-[02-17]–±04.0-[03-07]–
±08.0-[08-0b]----00.0-[09-0b]–±08.0-[0a]----00.0
| -10.0-[0b]----00.0
±0c.0-[0c-0f]----00.0-[0d-0f]–±08.0-[0e]----00.0
| -10.0-[0f]----00.0
±10.0-[10-13]----00.0-[11-13]–±08.0-[12]----00.0
| -10.0-[13]----00.0
-14.0-[14-17]----00.0-[15-17]–±08.0-[16]----00.0
-10.0-[17]----00.0

CUDA Capability major/minor version number is 3.7 and it’s same for all GPUs.

Would there be an issue in the driver? Please help in debugging it.