Ubuntu - NVLink not working with two RTX 3090

Hi!

I have two RTX 3090 installed. Both are properly connected via NVLink. Unfortunately, NVLink does not work.

Any ideas?

I have compiled some outputs below.

All the best,
Tristan

NVIDIA SMI NVLink:

$ nvidia-smi nvlink -s
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-631e6a76-d7c7-0e5b-c011-6e20f573017c)
	 Link 0: <inactive>
	 Link 1: <inactive>
	 Link 2: <inactive>
	 Link 3: <inactive>
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-15971343-c0e3-5be9-19fb-c1c4425ec80c)
	 Link 0: <inactive>
	 Link 1: <inactive>
	 Link 2: <inactive>
	 Link 3: <inactive>

Ubuntu distribution:

$ cat /etc/issue
Ubuntu 20.04.4 LTS \n \l

NVIDIA-SMI:

$ nvidia-smi
Sat Sep  3 15:59:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0 Off |                  N/A |
| 30%   32C    P8    12W / 350W |   2261MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:0B:00.0 Off |                  N/A |
| 30%   31C    P8    18W / 350W |      3MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       897      C   /usr/bin/python3                  837MiB |
|    0   N/A  N/A      2471      C   /usr/bin/python3                 1421MiB |
+-----------------------------------------------------------------------------+

NVIDIA cuda compiler driver:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

NVIDIA driver:

$ apt list --installed | grep nvidia-driver

nvidia-driver-515/unknown,now 515.65.01-0ubuntu1 amd64 [installed,automatic]
1 Like

Hi,
It’s months since you posted this, any luck so far ?

C:\Windows\System32>nvidia-smi nvlink --status
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-21d77ada-f307-f1f5-ch81-728cb720e953)
         Link 0: <inactive>
         Link 1: 14.062 GB/s
         Link 2: 14.062 GB/s
         Link 3: 14.062 GB/s
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-ddf42db5-4e92-42c1-5593-793bb78e3ec8)
         Link 0: <inactive>
         Link 1: 14.062 GB/s
         Link 2: 14.062 GB/s
         Link 3: 14.062 GB/s

mine is doing this, it’s in windows and there is an HDMI cable plugged in to one, so that’s why it’s using those links, but i haven;t been able to get the link 0 to show anything else than ‘inactive’

Any luck to make it work? I have the same problem with the same system. Thanks!

Hi,
It was a little annoying, but eventually I fixed my problem, which granted is probably not quite the same as the case of the poster at the top.
I hate to say it but it was as simple an ensuring both cards were fully seated.

I have am MSI MEG UNIFY AMD x570motherboard, with the chipset heat sync and fan it meant the clearance between the base of my second 3090 and the top of the chipset cooler was barely enough to allow it to sit fully in.

When I read people had solved it by “pushing it right in”
I figured they meant the bridge had a particularly tricky plug that needed pushing it in more than you would think.
It is a decent depth plug but the problem you are having is , as far my situation suggests is the card that sits across the chipset cooler is not quite reliably seated.

1 Like

Thank you so much! I get it work by reinstalling the system and the drivers. Thanks anyway.

This topic was automatically closed after 12 days. New replies are no longer allowed.