Linux SLI and NVlink

preindl · April 17, 2020, 8:25pm

Hi All,
I am trying to setup SLI for two TITAN Xp cards on a Debian 10 buster Gnome X11 (with kernel+driver backports applied) system according to
https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/sli.html

The system starts and I can login regularly. However, logging in the screen goes black and will eventually go to sleep (I can still ssh into the system).

The system works regularly, if I force (Option “SLI” “Off”) in the “Screen” section of the /etc/X11/xorg.config.
Any inputs on configuring the system correctly for SLI would be greatly appreciated.

generix · April 18, 2020, 12:27am

Please don’t do this. SLI is broken and useless on Linux. For further info, please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

preindl · April 18, 2020, 12:37am

Thank you for the information, I’ll stop experimenting with SLI.

The reason why I tried this (and bought an Nvidia HB SLI bridge), was to evaluate card stacking for machine learning in Linux. I am well aware that SLI is not used in compuation but NVlink is, I just did not have RTX cards at hand.

Is there any reason to believe that NVlink, e.g. of two Tital RTX, will behave differently?

Many thanks.

mdegans · April 18, 2020, 12:54am

And has been since the feature was introduced, unfortunately. It never worked right. You might want to remove the (Nvidia HB SLI bridge) bridge if it’s connected. You use multiple cards without it. Using DeepStream, for example, you tell a nvinfer element to use a specific device. Likewise with training frameworks.

generix · April 18, 2020, 12:54am

NVlink will greatly improve inter-gpu transfer-rates, great for cuda. Just SLI graphics is pointless.
Put simply, great for machine-learning, shipwrecked trying to use it for desktop.

preindl · April 18, 2020, 1:05am

@mdegans that is how I have been using the cards beforehand. However, I am increasingly running into memory limitations. Hence the idea to experiment with GPU stacking.

Just to be sure, Geforce NVlink (like the one found in RTX 2080 Ti or Titan RTX) does rely on a sufficiently different driver subsystem (not sure how to call it) such that it can be used for compute on Linux?

preindl · April 18, 2020, 1:15am

Out of technical interest, are there benefit/drawbacks having an SLI bridge connected when SLI is deactivated in the driver?

mdegans · April 18, 2020, 1:18am

are there benefit/drawbacks having an SLI bridge connected when SLI is deactivated in the driver

TBH, I don’t remember the details other than I had to remove it. It worked in Windows but didn’t in Linux is all I remember. The issue might be fixed by now, but if you’re going to use the box for Linux alone, it’s pointless anyway.

mdegans · April 18, 2020, 1:19am

Geforce NVlink (like the one found in RTX 2080 Ti or Titan RTX) does rely on a sufficiently different driver subsystem (not sure how to call it) such that it can be used for compute on Linux?

I believe so, but you will want to confirm that with an nvidia rep before shelling out for such cards.

generix · April 18, 2020, 1:27am

You’ll have to distuingish between usage on Xorg and cuda. On xorg, simply disable the usage of sli, it’ll then make no difference of having the bridge plugged in or not. For cuda, have it plugged, it’ll be used without any additional config.
Simply put, leave it on, disable sli usage in xorg config.
-but-
In general, having Xorg run on the same gpu you’re extensively using for cuda is not recommended. In default config, this can lead to your cuda kernels being killed mid-air, like during running larger training jobs. See:
https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/using-cuda-and-x
So the recommendation would be to use a cheap add-on card or the integrated graphics for desktop and dedicate the heavy gpus for cuda. See CUDA_VISIBLE_DEVICES env variable.

generix · April 18, 2020, 1:34am

Addendum: since you have a specific problem with out-of-memory conditions, I don’t think that nvlink will help in that situation, the gpu memory won’t add up. Of course interested in pratical experiences, so please report back.

mdegans · April 18, 2020, 1:35am

have it plugged, it’ll be used

Are you certain? I use CUDA on multiple GPUs without it and it seems fine. I thought the sli was only for frame sync. Anyway the machine in question doesn’t even have x11 running anymore.

generix · April 18, 2020, 1:58am

You’re mixing up (not too) different things. SLI is a basically a marketing term for gpu coupling over pci(e) for graphics purposes from ancient times.
nvlink derived from that and adds up to it but gained independence over the years in terms of cuda.
Like said,

NVlink will greatly improve inter-gpu transfer-rates

whether your cuda application will benefit of it, depends.

generix · April 18, 2020, 2:07am

See this for some interconnect speeds:
https://forums.developer.nvidia.com/t/simplep2p-example-and-multi-gpu-network-training-causes-system-freeze-and-err-in-nvidia-smi/70225/5?u=generix

mdegans · April 18, 2020, 2:32am

Yeah, but I don’t think the “Nvidia HB SLI bridge” is NVLink. IIRC that’s for the 10x series and only works for games.

generix · April 18, 2020, 2:39am

“Nvidia HB SLI bridge” is a name of a physical product establishing an “NVLink”
only works for games: No.

generix · April 18, 2020, 2:42am

IIRC, “HB” means high bandwidth, so it’s “NVlink2”

generix · April 18, 2020, 3:01am

For completeness, why SLI graphics sucks:
https://developer.nvidia.com/explicit-multi-gpu-programming-directx-12

mdegans · April 18, 2020, 4:33pm

I think your you are mistaken, or I have read some bad documentation.

My understanding is NVLink is a PCI Express alternative mostly for some exotic architectures. They do use it like SLI on some very fancy cards, on top, in addition to PCI Express, and this is confusing.

Take a look at those speeds. The two technologies or nothing alike.

preindl · April 18, 2020, 6:04pm

Are you running Linux headless or with wayland?
If it is the latter, is it tricky to set up?

Topic		Replies	Views
Enabling SLI makes all the windows start flashing on Ubuntu 14.04 Linux	21	16188	October 7, 2014
Multi monitors, multi gpu, gaming on linux. Work needed on SLI. Linux	11	7026	December 28, 2014
SLI not working Ubuntu latest 64bit dual GTX1070 with HB bridge installed Linux	3	823	October 12, 2021
Unity doesn't start on Ubuntu 16.04 when SLI=on Linux	27	9336	October 12, 2017
SLI option not present in Nvidia Control Panel GPU - Hardware	5	2737	May 2, 2021
Does SLI work without NVLink bridge?ha GPU - Hardware	0	229	October 31, 2024
NVLink SLI to use two RTX2080 Ti's as one GPU to get a total memory of 22GB for CUDA 9.X CUDA Programming and Performance	0	370	March 10, 2020
RTX 3090 + NVLink + CUDA P2P - not working on Linux or Windows, in different ways? CUDA Programming and Performance	9	7657	May 24, 2023
Problem with Enabling SLI configuration on Linux Linux	10	3086	March 24, 2019
Dual NVlinked RTX 2080 TI setup on HP workstation - No SLI option in control panel GPU - Hardware	0	656	December 4, 2019

Linux SLI and NVlink

Related topics