Hi I’m working in a RH7 node with 8 K20 and using the sdk RC7.5. Executing an official sample of p2p (p2pBandwidthLatencyTest) the process freeze in the p2p comunication. In that point (waiting 10 min) if I kill the process … The system get in an unstability state and I have to reboot. What is it wrong??? What can I do??
Here is the output of the sample:
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Tesla K20m, pciBusID: 4, pciDeviceID: 0, pciDomainID:0
Device: 1, Tesla K20m, pciBusID: 5, pciDeviceID: 0, pciDomainID:0
Device: 2, Tesla K20m, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
Device: 3, Tesla K20m, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
Device: 4, Tesla K20m, pciBusID: 83, pciDeviceID: 0, pciDomainID:0
Device: 5, Tesla K20m, pciBusID: 84, pciDeviceID: 0, pciDomainID:0
Device: 6, Tesla K20m, pciBusID: 87, pciDeviceID: 0, pciDomainID:0
Device: 7, Tesla K20m, pciBusID: 88, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=0 CAN Access Peer Device=3
Device=0 CANNOT Access Peer Device=4
Device=0 CANNOT Access Peer Device=5
Device=0 CANNOT Access Peer Device=6
Device=0 CANNOT Access Peer Device=7
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=3
Device=1 CANNOT Access Peer Device=4
Device=1 CANNOT Access Peer Device=5
Device=1 CANNOT Access Peer Device=6
Device=1 CANNOT Access Peer Device=7
Device=2 CAN Access Peer Device=0
Device=2 CAN Access Peer Device=1
Device=2 CAN Access Peer Device=3
Device=2 CANNOT Access Peer Device=4
Device=2 CANNOT Access Peer Device=5
Device=2 CANNOT Access Peer Device=6
Device=2 CANNOT Access Peer Device=7
Device=3 CAN Access Peer Device=0
Device=3 CAN Access Peer Device=1
Device=3 CAN Access Peer Device=2
Device=3 CANNOT Access Peer Device=4
Device=3 CANNOT Access Peer Device=5
Device=3 CANNOT Access Peer Device=6
Device=3 CANNOT Access Peer Device=7
Device=4 CANNOT Access Peer Device=0
Device=4 CANNOT Access Peer Device=1
Device=4 CANNOT Access Peer Device=2
Device=4 CANNOT Access Peer Device=3
Device=4 CAN Access Peer Device=5
Device=4 CAN Access Peer Device=6
Device=4 CAN Access Peer Device=7
Device=5 CANNOT Access Peer Device=0
Device=5 CANNOT Access Peer Device=1
Device=5 CANNOT Access Peer Device=2
Device=5 CANNOT Access Peer Device=3
Device=5 CAN Access Peer Device=4
Device=5 CAN Access Peer Device=6
Device=5 CAN Access Peer Device=7
Device=6 CANNOT Access Peer Device=0
Device=6 CANNOT Access Peer Device=1
Device=6 CANNOT Access Peer Device=2
Device=6 CANNOT Access Peer Device=3
Device=6 CAN Access Peer Device=4
Device=6 CAN Access Peer Device=5
Device=6 CAN Access Peer Device=7
Device=7 CANNOT Access Peer Device=0
Device=7 CANNOT Access Peer Device=1
Device=7 CANNOT Access Peer Device=2
Device=7 CANNOT Access Peer Device=3
Device=7 CAN Access Peer Device=4
Device=7 CAN Access Peer Device=5
Device=7 CAN Access Peer Device=6
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) in those cases.
P2P Cliques:
[0 1 2 3]
[4 5 6 7]
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1 2 3 4 5 6 7
0 74.90 3.53 4.46 4.51 4.65 4.75 4.78 4.78
1 3.54 73.09 3.81 3.81 4.83 4.85 4.83 4.84
2 6.03 6.02 74.30 6.02 5.22 5.16 5.16 5.14
3 5.72 5.81 5.77 74.37 4.68 4.68 4.67 4.68
4 4.96 4.92 4.95 4.90 74.39 3.32 3.59 3.55
5 4.62 4.62 4.61 4.62 5.35 73.22 5.45 5.56
6 4.79 4.77 4.79 4.77 3.81 3.77 74.25 2.90
7 4.94 5.08 5.13 5.08 3.59 3.62 3.44 74.34
Unidirectional P2P=Enabled Bandwidth Matrix (GB/s)
Here is the topology
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity
GPU0 X PIX PHB PHB SOC SOC SOC SOC 0-5,12-17
GPU1 PIX X PHB PHB SOC SOC SOC SOC 0-5,12-17
GPU2 PHB PHB X PIX SOC SOC SOC SOC 0-5,12-17
GPU3 PHB PHB PIX X SOC SOC SOC SOC 0-5,12-17
GPU4 SOC SOC SOC SOC X PIX PHB PHB 6-11,18-23
GPU5 SOC SOC SOC SOC PIX X PHB PHB 6-11,18-23
GPU6 SOC SOC SOC SOC PHB PHB X PIX 6-11,18-23
GPU7 SOC SOC SOC SOC PHB PHB PIX X 6-11,18-23
Legend:
X = Self
SOC = Path traverses a socket-level link (e.g. QPI)
PHB = Path traverses a PCIe host bridge
PXB = Path traverses multiple PCIe internal switches
PIX = Path traverses a PCIe internal switch