Making sure all previous versions of CUDA are gone (Drivers randomly fail on reboot)

Hello! I’ve installed and uninstalled and reinstalled cuda and NVIDIA drivers probably like 15 times now on this computer. I usually end up needing to reinstall the drivers bcus, after what seems like a normal reboot, I’ll load into ubuntu to be greeted by 600x400 resolution and one monitor turned off - somehow the drivers have failed. I run a sudo apt-get --purge remove nvidia-\*, a apt search nvidia-driver, pick the most recent driver, run a sudo apt-get install nvidia-driver-455 (455 is the one I currently have), reboot, and go on my merry way without CUDA bcus I am scared and frightened (I usually need to work and can’t deal with the installation of CUDA at that time).

I dual boot windows and ubuntu, and it SEEMS like it only happens after I boot into windows. I also had to reinstall my current ubuntu OS onto a new harddrive (ran out of space), which caused me booting errors before. Perhaps that is hurting me here, but truthfully I have no idea.

I want to be able to install CUDA correctly, and clearly I need assistance, but first I want to figure out why my drivers quit randomly. I don’t know why they break or how to troubleshoot this, but my theory is that in the numerous reinstalls, I left something in place that should not be in place, and that’s messing things up.

These are the steps I’ve taken thus far.

I have run sudo zypper remove "cuda*" "*cublas*" "*cufft*" "*curand*" \ "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "nsight*" to completely remove CUDA.

I ran these commands looking for symlinks that were left over

maximus@maxmaxmax3:~$ ls -la /usr/local | grep “->”

lrwxrwxrwx 1 root root 9 Sep 7 2018 man -> share/man

maximus@maxmaxmax3:~$ ls -la /usr/local/bin/ | grep “->”
(no output after this command)

My PATH is this: /home/maximus/gems/bin:/home/maximus/anaconda3/bin:/usr/local/cuda/bin:/home/maximus/gems/bin:/home/maximus/anaconda3/bin:/home/maximus/bin:/home/maximus/.local/bin:/home/maximus/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin which seems too long but notable includes a cuda link that doesnt exist

This is what my GUI for the software updater looks like (Ive had to mess with this before as well)

This is the output of nvidia-smi. Notably, it states Cuda Version 11.1, tho no cuda is installed (although ive seen it report something like this when nothing is installed before, I think it is normal behavoir:

Fri Jan 8 19:52:35 2021
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 GeForce GTX 1080 On | 00000000:01:00.0 On | N/A |
| 60% 37C P0 44W / 215W | 992MiB / 8116MiB | 0% Default |
| | | N/A |

| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| 0 N/A N/A 1708 G /usr/lib/xorg/Xorg 62MiB |
| 0 N/A N/A 1744 G /usr/bin/gnome-shell 50MiB |
| 0 N/A N/A 2469 G /usr/lib/xorg/Xorg 544MiB |
| 0 N/A N/A 2635 G /usr/bin/gnome-shell 140MiB |
| 0 N/A N/A 3093 G …AAAAAAAA== --shared-files 188MiB |

Please let me know if you’ve had similar experiences with NVIDIA drivers breaking seemingly at random, specifically if its related to dual booting. Thank you.

I guess your problem with cuda is that you’re installing the full ‘cuda’ metapackage which overwrties the driver with an incompatible one. Jus install the cuda toolkit, e.g. for 11.1
sudo apt install cuda-toolkit-11-1

Regarding your driver breaking after booting into Windows, please run as root when the issue hits and attach the resulting nvidia-bug-report.log.gz file to your post.