NV-Link Setup Troubleshooting and NV-Link Status Output Help

We have been noticing some odd behavior when trying to configure one of our servers (running CentOS 7) for NV-Link using two GV100 GPUs. It appears that two of the links between the GPUs are responding as inactive as shown in the nvidia-smi nv-link status shown below.

Based on the individual link speed (~25 GB/s) it appears we are utilizing NVLink 2.0 but when looking at the bidirectional bandwidth, reported by the p2pBandwidthTest, it appears that we are only getting (~140 GB/s) which mimics NVLink 1.0 speeds when we should be getting ~300 GB/s over NVLink 2.0 .

Please advise what the correct output of nvidia-smi and p2pBandwidthTest should look like for 2 GPUs that have a correctly configured NVLink 2.0 connection?

NV-Link Status reported from nvidia-smi for our two GV100 GPUs:

$nvidia-smi nvlink -s
 
GPU 0: Quadro GV100 (UUID: GPU-6c950f3b-d765-c14a-0f81-5ca6be0a81a7)
Link 0: 25.781 GB/s
Link 1: <inactive>
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
GPU 1: Quadro GV100 (UUID: GPU-fb5e90b3-f1e1-78fb-8f7e-aef576e48a09)
Link 0: <inactive>
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
$nvidia-smi nvlink -c
 
GPU 0: Quadro GV100 (UUID: GPU-6c950f3b-d765-c14a-0f81-5ca6be0a81a7)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false
GPU 1: Quadro GV100 (UUID: GPU-fb5e90b3-f1e1-78fb-8f7e-aef576e48a09)
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: true
Link 1, SLI is supported: true
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true   
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false

Running the Peer-to-Peer Bandwidth Latency test provided in CUDA Utilities on two GV100 GPU’s:

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Quadro GV100, pciBusID: 3b, pciDeviceID: 0, pciDomainID:0
Device: 1, Quadro GV100, pciBusID: d8, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
   D\D     0     1
     0       1     1
     1       1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 548.63  10.43
     1  10.64 552.51
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 548.63  72.27
     1  72.27 552.51
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 557.64  18.78
     1  18.65 560.04
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 560.84 143.71
     1 140.14 561.65
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   1.87  18.34
     1  18.23   2.27
 
   CPU     0      1
     0   4.02  11.83
     1  12.05   5.07
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   1.87   1.91
     1   2.02   2.26

   CPU     0      1
     0   4.06   3.33
     1   3.43   5.04