Upgrade from Ubuntu 18 to 20 messed up graphics drivers

I recently upgraded my ubuntu OS from 18 to 20. This messed up my nvidia driver installation. My monitors are connected to the onboard intel graphics, and this is working fine after the upgrade. I re-installed nvidia drivers after the upgrade (“sudo apt install nvidia-driver-495”). But nvidia-smi gives the following error message “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

I want to use the integrated intel graphics for the monitors, and the NVIDIA GPU just for my deep learning experiments.

Here is the bug report.
nvidia-bug-report.log.gz (126.4 KB)

I did try some of the steps from Unable to load the 'nvidia-drm' kernel module on Ubuntu 18.04. But they dont seem to work. Any help is appreciated.

Please try this:
https://forums.developer.nvidia.com/t/i-cannot-use-my-gpu-in-ubuntu-20-04/200686/2

I followed the instructions and deleted the “blacklist nvidia” line from the file.

The grep nvidia /etc/modprobe.d/* /lib/modprobe.d/* commands output looks like this:

/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/lib/modprobe.d/blacklist-nvidia.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/blacklist-nvidia.conf:blacklist nvidia-drm
/lib/modprobe.d/blacklist-nvidia.conf:blacklist nvidia-modeset
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia off
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia-drm off
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia-modeset off
/lib/modprobe.d/nvidia-kms.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/nvidia-kms.conf:options nvidia-drm modeset=1

The output of nvidia-smi is still the same

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

you need to delete the whole file

/lib/modprobe.d/blacklist-nvidia.conf

then
run sudo update-initramfs -u
and reboot.

Deleted the whole file, ran the update-initramfs and rebooted.

/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/lib/modprobe.d/nvidia-kms.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/nvidia-kms.conf:options nvidia-drm modeset=1

Same problem. Attached is the new bug report.
nvidia-bug-report.log.gz (104.0 KB)

Now the kernel module went missing. Please reinstall the kernel headers
sudo apt install linux-headers-$(uname -r)
and afterwards post the output of
dkms status

$ sudo apt install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-5.4.0-94-generic is already the newest version (5.4.0-94.106).
linux-headers-5.4.0-94-generic set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
$ dkms status
rtl88x2bu, 5.8.7.1, 5.4.0-94-generic, x86_64: installed

How did you previously install the driver?
Please post the output of
dpkg -l |grep nvidia

I installed it using sudo apt install nvidia-driver-495.

The output of dpkg -l | grep nvidia is

ii  libnvidia-cfg1-495:amd64                                    495.46-0ubuntu0.20.04.1               amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-495                                        495.46-0ubuntu0.20.04.1               all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-450:amd64                                 450.102.04-0ubuntu1                   amd64        NVIDIA libcompute package
rc  libnvidia-compute-460:amd64                                 460.106.00-0ubuntu1                   amd64        NVIDIA libcompute package
ii  libnvidia-compute-495:amd64                                 495.46-0ubuntu0.20.04.1               amd64        NVIDIA libcompute package
ii  libnvidia-compute-495:i386                                  495.46-0ubuntu0.20.04.1               i386         NVIDIA libcompute package
ii  libnvidia-decode-495:amd64                                  495.46-0ubuntu0.20.04.1               amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-495:i386                                   495.46-0ubuntu0.20.04.1               i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-495:amd64                                  495.46-0ubuntu0.20.04.1               amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-495:i386                                   495.46-0ubuntu0.20.04.1               i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-495:amd64                                   495.46-0ubuntu0.20.04.1               amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-495:amd64                                    495.46-0ubuntu0.20.04.1               amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-495:i386                                     495.46-0ubuntu0.20.04.1               i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-495:amd64                                      495.46-0ubuntu0.20.04.1               amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-495:i386                                       495.46-0ubuntu0.20.04.1               i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
rc  nvidia-compute-utils-450                                    450.102.04-0ubuntu1                   amd64        NVIDIA compute utilities
rc  nvidia-compute-utils-460                                    460.106.00-0ubuntu1                   amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-495                                    495.46-0ubuntu0.20.04.1               amd64        NVIDIA compute utilities
rc  nvidia-dkms-450                                             450.102.04-0ubuntu1                   amd64        NVIDIA DKMS package
rc  nvidia-dkms-460                                             460.106.00-0ubuntu1                   amd64        NVIDIA DKMS package
ii  nvidia-dkms-495                                             495.46-0ubuntu0.20.04.1               amd64        NVIDIA DKMS package
ii  nvidia-driver-495                                           495.46-0ubuntu0.20.04.1               amd64        NVIDIA driver metapackage
rc  nvidia-kernel-common-450                                    450.102.04-0ubuntu1                   amd64        Shared files used with the kernel module
rc  nvidia-kernel-common-460                                    460.106.00-0ubuntu1                   amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-495                                    495.46-0ubuntu0.20.04.1               amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-495                                    495.46-0ubuntu0.20.04.1               amd64        NVIDIA kernel source package
ii  nvidia-machine-learning-repo-ubuntu1804                     1.0.0-1                               amd64        nvidia-machine-learning repository configuration files
ii  nvidia-prime                                                0.8.16~0.20.04.1                      all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                                             470.57.01-0ubuntu0.20.04.2            amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-495                                            495.46-0ubuntu0.20.04.1               amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                                     0.18build1                            all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-495                               495.46-0ubuntu0.20.04.1               amd64        NVIDIA binary Xorg driver

I re-ran all the steps that you mentioned above. When i rebooted, the X GUI stopped working and I had to run “prime-select intel” to get the GUI back.

Attached is the new bug report.
nvidia-bug-report.log.gz (127.5 KB)

$ dkms status
nvidia, 495.46, 5.4.0-94-generic, x86_64: installed
rtl88x2bu, 5.8.7.1, 5.4.0-94-generic, x86_64: installed

You have a config file somewhere that uses the modesetting driver for the nvidia gpu.
Please post the output of
ls -l /etc/X11 /etc/X11/xorg.conf.d /usr/share/X11/xorg.conf.d

ls: cannot access '/etc/X11/xorg.conf.d': No such file or directory
/etc/X11:
total 96
drwxr-xr-x 2 root root  4096 Jan 15 20:55 app-defaults
drwxr-xr-x 2 root root  4096 Jan 15 20:52 cursors
-rw-r--r-- 1 root root    15 Apr 26  2018 default-display-manager
drwxr-xr-x 4 root root  4096 Apr 26  2018 fonts
drwxr-xr-x 2 root root  4096 Nov  7  2018 imwheel
-rw-r--r-- 1 root root 17394 Feb 23  2016 rgb.txt
drwxr-xr-x 2 root root  4096 Jan 15 20:55 xinit
drwxr-xr-x 2 root root  4096 Feb  2  2018 xkb
-rw-r--r-- 1 root root    91 Mar 24  2019 xorg.conf
-rw-r--r-- 1 root root    90 Mar 24  2019 xorg.conf~
-rw-r--r-- 1 root root  1252 Mar 23  2019 xorg.conf.backup
-rw-r--r-- 1 root root     0 Mar 23  2019 xorg.conf.nvidia-xconfig-original
-rwxr-xr-x 1 root root   709 Feb 23  2016 Xreset
drwxr-xr-x 2 root root  4096 Jan 15 20:51 Xreset.d
drwxr-xr-x 2 root root  4096 Jan 15 20:51 Xresources
-rwxr-xr-x 1 root root  3730 Nov  3  2017 Xsession
drwxr-xr-x 2 root root  4096 Jan 15 20:55 Xsession.d
-rw-r--r-- 1 root root   265 Feb 23  2016 Xsession.options
drwxr-xr-x 2 root root  4096 Jan 15 20:55 xsm
-rw-r--r-- 1 root root    13 Dec  5  2016 XvMCConfig
-rw-r--r-- 1 root root   630 Apr 26  2018 Xwrapper.config

/usr/share/X11/xorg.conf.d:
total 24
-rw-r--r-- 1 root root   92 Oct 22  2019 10-amdgpu.conf
-rw-r--r-- 1 root root  206 Dec 16 06:05 10-nvidia.conf
-rw-r--r-- 1 root root 1350 Dec 14 06:14 10-quirks.conf
-rw-r--r-- 1 root root   92 Oct 22  2019 10-radeon.conf
-rw-r--r-- 1 root root 1429 Aug 13  2019 40-libinput.conf
-rw-r--r-- 1 root root 3458 Mar 11  2020 70-wacom.conf

Please delete all files /etc/X11/xorg.conf*
Then try switching to nvidia again

ok, that seems to have fixed the ```nvidia-smi``

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 32%   29C    P8     8W / 225W |    273MiB /  7982MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1446      G   /usr/lib/xorg/Xorg                 45MiB |
|    0   N/A  N/A      2227      G   /usr/lib/xorg/Xorg                 80MiB |
|    0   N/A  N/A      2385      G   /usr/bin/gnome-shell               88MiB |
|    0   N/A  N/A      2821      G   ...AAAAAAAAA= --shared-files       45MiB |
+-----------------------------------------------------------------------------+

Do i have to create a file /etc/X11/xorg.conf to prevent X from using the gpu?

Section "Device"
    Identifier     "intel"
    Driver         "modesetting"
    BusID          "PCI:0:2:0"
EndSection

You can more easily switch to on-demand-mode
sudo prime-select on-demand

Thank you @generix. I will give that a shot.

On another note, what would be the best way to install CUDA 11? Looks like my current sources are setup for CUDA 10.

$ sudo apt update
Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease                                                                                                             
Hit:3 http://security.ubuntu.com/ubuntu focal-security InRelease                                                                                                      
Hit:4 http://packages.ros.org/ros2/ubuntu focal InRelease                                                                                                             
Hit:5 http://packages.osrfoundation.org/gazebo/ubuntu-stable focal InRelease                                                                                          
Hit:6 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease                                                                                                     
Hit:7 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease                            
Hit:8 http://archive.canonical.com/ubuntu focal InRelease                
Hit:9 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal InRelease
Reading package lists... Done
Building dependency tree       
Reading state information... Done
All packages are up to date.
$ apt search nvidia-cuda-toolkit
Sorting... Done
Full Text Search... Done
nvidia-cuda-toolkit/focal 10.1.243-3 amd64
  NVIDIA CUDA development toolkit

nvidia-cuda-toolkit-gcc/focal 10.1.243-3 amd64
  NVIDIA CUDA development toolkit (GCC compatibility)

Just add the repo from the nvidia downloads page and install cuda-toolkit (not cuda), e.g.
sudo apt install cuda-toolkit
or for a specific version
sudo apt install cuda-toolkit-11-4

Great! Appreciate the help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.