NV-Link Setup Troubleshooting and NV-Link Status Output Help

We have been noticing some odd behavior when trying to configure one of our servers (running CentOS 7) for NV-Link using two GV100 GPUs. It appears that two of the links between the GPUs are responding as inactive as shown in the nvidia-smi nv-link status shown below.

Based on the individual link speed (~25 GB/s) it appears we are utilizing NVLink 2.0 but when looking at the bidirectional bandwidth, reported by the p2pBandwidthTest, it appears that we are only getting (~140 GB/s) which mimics NVLink 1.0 speeds when we should be getting ~300 GB/s over NVLink 2.0 .

Please advise what the correct output of nvidia-smi and p2pBandwidthTest should look like for 2 GPUs that have a correctly configured NVLink 2.0 connection?

NV-Link Status reported from nvidia-smi for our two GV100 GPUs:

$nvidia-smi nvlink -s
 
GPU 0: Quadro GV100 (UUID: GPU-6c950f3b-d765-c14a-0f81-5ca6be0a81a7)
Link 0: 25.781 GB/s
Link 1: <inactive>
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
GPU 1: Quadro GV100 (UUID: GPU-fb5e90b3-f1e1-78fb-8f7e-aef576e48a09)
Link 0: <inactive>
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
$nvidia-smi nvlink -c
 
GPU 0: Quadro GV100 (UUID: GPU-6c950f3b-d765-c14a-0f81-5ca6be0a81a7)
Link 0, P2P is supported: true
Link 0, Access to system memory supported: true
Link 0, P2P atomics supported: true
Link 0, System memory atomics supported: true
Link 0, SLI is supported: true
Link 0, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false
GPU 1: Quadro GV100 (UUID: GPU-fb5e90b3-f1e1-78fb-8f7e-aef576e48a09)
Link 1, P2P is supported: true
Link 1, Access to system memory supported: true
Link 1, P2P atomics supported: true
Link 1, System memory atomics supported: true
Link 1, SLI is supported: true
Link 1, Link is supported: false
Link 2, P2P is supported: true
Link 2, Access to system memory supported: true
Link 2, P2P atomics supported: true   
Link 2, System memory atomics supported: true
Link 2, SLI is supported: true
Link 2, Link is supported: false
Link 3, P2P is supported: true
Link 3, Access to system memory supported: true
Link 3, P2P atomics supported: true
Link 3, System memory atomics supported: true
Link 3, SLI is supported: true
Link 3, Link is supported: false

Running the Peer-to-Peer Bandwidth Latency test provided in CUDA Utilities on two GV100 GPU’s:

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Quadro GV100, pciBusID: 3b, pciDeviceID: 0, pciDomainID:0
Device: 1, Quadro GV100, pciBusID: d8, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
   D\D     0     1
     0       1     1
     1       1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 548.63  10.43
     1  10.64 552.51
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 548.63  72.27
     1  72.27 552.51
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 557.64  18.78
     1  18.65 560.04
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 560.84 143.71
     1 140.14 561.65
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   1.87  18.34
     1  18.23   2.27
 
   CPU     0      1
     0   4.02  11.83
     1  12.05   5.07
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   1.87   1.91
     1   2.02   2.26

   CPU     0      1
     0   4.06   3.33
     1   3.43   5.04

For all those who desperately end up here. For us it was the NVLink not all the way in the socket. Pressing with a little more force did the trick.

1 Like

I have a similar problem with two NVIDIA GeForce RTX 3090 cards and nvlink. However, pressing hard is not working.

1 Like

There is a recent post regarding the lottery that is P2P connectivity here, if you’ve not already seen it.

1 Like

to be fair, my post was about the lottery of whether P2P would be enabled at all, not about NVLink speed.

But the principle of NVIDIA expecting us to fill but the “lottery card” first then wait for the results is indeed similar.

Thanks @rs277, @epk. I actually invested a good amount of time and money on these RTX 3090 cards and nvlink bridge in the hope of seamless better computation power. This should be a smooth configuration, not so much hassle. So, the question remains,

  1. Can two rtx3090 be connected over nvlink bridge? The answer should be yes because each of the rtx3090 has nvlink connector on it

  2. The nvlink is definitely not working on my system, because I cannot see it on nvidia control panel, or in nvidia-smi command, or in GPU-Z tool, which should be easily visible

  3. How can I do it? May be @Robert_Crovella can give a better answer.

1 Like

Yes, but whether it works or not is subject to all the vagaries Robert outlined in the thread linked above.

It seems apparent that in order for P2P/NVlink to function correctly, there is quite a complex chain of dependencies, (driver, motherboard hardware, correct BIOS configuration of said hardware, number of PCIe lanes, etc), that need to be satisfied in order for this to function reliably.

It’s somewhat disappointing that Nvidia have not emphasised this more and saved a lot of people wasted time and money. I wonder if this is the reason NVLink is no longer offered on Geforce cards, given Tesla/Quadro customers are probably more likely to be buying turn-key systems where this functionality is specified.

Later: You might find the BIOS setting info in this thread helpful.

1 Like

Thanks @rs277, the motherboard issue was not in my head at all. I bought the latest motherboard and did not think whether it would support sli/nvlink at all. I just saw it can fit two RTX, that’s it. Now I feel really stupid.

I have ASUS ROG MAXIMUS Z690 HERO EVA, and google says it does not support sli. But one thing, when I am running nvidia-smi nvlink -s without nvlink

GPU 0: NVIDIA GeForce RTX 3090 
NVML: Unable to retrieve NVLink information as all links are inactive
GPU 1: NVIDIA GeForce RTX 3090 
NVML: Unable to retrieve NVLink information as all links are inactive

And when I am running nvidia-smi nvlink -s with nvlink

GPU 0: NVIDIA GeForce RTX 3090 
         Link 0: 14.062 GB/s
         Link 1: 14.062 GB/s
         Link 2: 14.062 GB/s
         Link 3: 14.062 GB/s
GPU 1: NVIDIA GeForce RTX 3090 
         Link 0: 14.062 GB/s
         Link 1: 14.062 GB/s
         Link 2: 14.062 GB/s
         Link 3: 14.062 GB/s

It means, the nvidia-smi can still detect the nvlink bridge, however cannot further transmit data,

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 3090, pciBusID: 1, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 3090, pciBusID: 2, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0