Linux Mint 2080 SLI

Hello,

Hoping to get some assistance. I have a brand new setup with fresh install of Linux Mint 18 x64, I have installed SLI RTX 2080 series cards with NVLINK bridge…

Upon installing drivers 410 and also 415 the machine reboots and then crashes into an endless loop of “crashed”.

I am hoping someone can assist? I am using the PPA option via the driver manager when attempting to install driver versions. again I have tried both 410, 415 with no luck. Thanks

nvidia-bug-report.log (2.09 MB)

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Apologize this is mostly new to me. I do not seem to have that option? My PC is currently in “fallback mode” and that is endless loop if I hit restart. I can get to a command prompt by F1, which I am currently in. However if I go to /usr/lib/NVIDIA I only have “pre-install” listed.

On the command prompt, check if you have internet connection by running
ping google.com

If you get a reply, hit ctrl+c to stop the ping, then

  • install pastebinit (sudo apt install pastebinit)
  • sudo nvidia-bug-report.sh
  • unzip logfile (gunzip nvidia-bug-report.log.gz)
  • upload logfile (pastebinit -i nvidia-bug-report.log)
  • note down and post the url you’re given

thank you for that great reply!

I got all the way to the end and it shows
bad API request, maximum paste file size exceeded

let me see if I can get to desktop on it and web browser back to this page to attach the log.
nvidia-bug-report.log (2.09 MB)

Seems to be a quite large log. You can also try to use ubuntu’s pastebin, has a higher size limit:
pastebinit -b http://paste.ubuntu.com -i nvidia-bug-report.log

not sure if its attached.

not sure if its attached.

I see it now above. Do you see it as well? Thanks

Ok, first of all, you’re running a much too old kernel for your hardware, please upgrade to the latest HWE stack:

sudo apt-get install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04

https://wiki.ubuntu.com/Kernel/LTSEnablementStack

Next, you still have your Intel gpu active (which currently doesn’t work properly, see above), is that on purpose? Is there a monitor connected to it? If you want to run the Nvidias in SLI for graphics, you’ll have to connect your monitor to them and disable the onboard intel in bios.
BTW, SLI doesn’t really work with linux, instead of doubling gaming performance you will most often cut it in half. Or do you want to use for compute reasons?

I ran that update, TY for that. it has a monitor attached yes. Intel GPU active? no not on purpose. I will remove that in BIOS after the install finishes.

I am building this for deep learning only… I am going to reboot now and see what happens. cant thank you enough for your help

So I got the same error, “fallback mode” upon running nvidia-smi I get this now
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Ma ke sure that the latest NVIDIA driver is installed and running.

Ok, for deep learning only a different setup might have some advantages. Enabling the Intel gpu and using it for graphics and the nvidia gpus for cuda only enables you to run larger cuda kernels. Downside of that is that you can’t use the nvidia gpus for graphics and a different driver setup is needed. See how far you get and if you run into cuda kernel timeouts you can change it.

Please create a new nvidia-bug-report.log.

Thanks for that tip. If I can get these cards working in general it seems a better route for me.
See below for new pastebin log file

https://pastebin.com/TcaktHw8

TY

Please run
gcc -v
this should return the version
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)
if it displays the version 5.3.1 then you first need to run
sudo apt-get update
sudo apt-get upgrade
first to update your system and get the right gcc.

You installed the driver using the .run installer and didn’t use the dkms option. This is not recommended and now left you without a driver after kernel update. Please reinstall the driver using --dkms option or uninstall it and use the driver from Ubuntu’s graphics ppa.
Please remove your current /etc/X11/xorg.conf and replace it with just

Section "Device"
    Identifier     "nvidia"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Also connect your monitor to the first Nvidia gpu.

Output of gcc -v = Ubuntu 5.4.0 -6ubuntu1~16.04.11

I installed this time with .run yes but my other attempts at ways to install have also failed. If I am to purge all Nvidia right now, can I ask how should I be getting/identifying this driver and task?

Please reinstall the driver using --dkms option or uninstall it and use the driver from Ubuntu’s graphics ppa.

Also, after I purge Nvidia drivers now I am going to shut down and remove one video card to attempt to just get a single 2080 working correctly before moving into SLI. Does that make sense? Thanks

I purged the files and then selected 415 version from Driver Manager (which I had done prior without success) BUT, this time it seemed to work just fine? output of NVIDIA-SMI is below and this is still in SLI…

Mon Feb 4 13:27:36 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27 Driver Version: 415.27 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:01:00.0 On | N/A |
| 0% 44C P8 5W / 245W | 247MiB / 7949MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 2080 Off | 00000000:02:00.0 Off | N/A |
| 0% 49C P8 10W / 245W | 1MiB / 7952MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1199 G /usr/lib/xorg/Xorg 203MiB |
| 0 1894 G cinnamon 42MiB |
±----------------------------------------------------------------------------+

and of inix -G

Graphics: Card-1: NVIDIA Device 1e87
Card-2: NVIDIA Device 1e87
Display Server: N/A drivers: nvidia,nouveau (unloaded: fbdev,vesa)
tty size: 128x37 Advanced Data: N/A out of X

You initially installed the nvidia driver right after OS install, without updating it first to the latest state so you had a kernel which didn’t support the rest of your hardware and an outdated compiler so nothing worked.

Thank you for all your help! To summarize, I should run exactly what command after fresh install?