Please don’t do this. SLI is broken and useless on Linux. For further info, please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).
Thank you for the information, I’ll stop experimenting with SLI.
The reason why I tried this (and bought an Nvidia HB SLI bridge), was to evaluate card stacking for machine learning in Linux. I am well aware that SLI is not used in compuation but NVlink is, I just did not have RTX cards at hand.
Is there any reason to believe that NVlink, e.g. of two Tital RTX, will behave differently?
And has been since the feature was introduced, unfortunately. It never worked right. You might want to remove the (Nvidia HB SLI bridge) bridge if it’s connected. You use multiple cards without it. Using DeepStream, for example, you tell a nvinfer element to use a specific device. Likewise with training frameworks.
@mdegans that is how I have been using the cards beforehand. However, I am increasingly running into memory limitations. Hence the idea to experiment with GPU stacking.
Just to be sure, Geforce NVlink (like the one found in RTX 2080 Ti or Titan RTX) does rely on a sufficiently different driver subsystem (not sure how to call it) such that it can be used for compute on Linux?
are there benefit/drawbacks having an SLI bridge connected when SLI is deactivated in the driver
TBH, I don’t remember the details other than I had to remove it. It worked in Windows but didn’t in Linux is all I remember. The issue might be fixed by now, but if you’re going to use the box for Linux alone, it’s pointless anyway.
You’ll have to distuingish between usage on Xorg and cuda. On xorg, simply disable the usage of sli, it’ll then make no difference of having the bridge plugged in or not. For cuda, have it plugged, it’ll be used without any additional config.
Simply put, leave it on, disable sli usage in xorg config.
In general, having Xorg run on the same gpu you’re extensively using for cuda is not recommended. In default config, this can lead to your cuda kernels being killed mid-air, like during running larger training jobs. See: https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/using-cuda-and-x
So the recommendation would be to use a cheap add-on card or the integrated graphics for desktop and dedicate the heavy gpus for cuda. See CUDA_VISIBLE_DEVICES env variable.
Addendum: since you have a specific problem with out-of-memory conditions, I don’t think that nvlink will help in that situation, the gpu memory won’t add up. Of course interested in pratical experiences, so please report back.
You’re mixing up (not too) different things. SLI is a basically a marketing term for gpu coupling over pci(e) for graphics purposes from ancient times.
nvlink derived from that and adds up to it but gained independence over the years in terms of cuda.
NVlink will greatly improve inter-gpu transfer-rates
whether your cuda application will benefit of it, depends.
I think your you are mistaken, or I have read some bad documentation.
My understanding is NVLink is a PCI Express alternative mostly for some exotic architectures. They do use it like SLI on some very fancy cards, on top, in addition to PCI Express, and this is confusing.
Take a look at those speeds. The two technologies or nothing alike.