I have been trying to install and use CUDA Toolkit on my newly installed Ubuntu 20.04 LTS machine running a NVidia GTX 970 card. However I seem unable to install CUDA Toolkit. At first I got an error similar to this. I followed the steps to purge all of the “nvidia” packages found. After this I ran the installer again (I am using the runfile from the CUDA Toolkit downloads page).
$ sudo sh cuda_11.3.1_465.19.01_linux.run Installation failed. See log at /var/log/cuda-installer.log for details.
I check cuda-installer.log for details:
$ cat /var/log/cuda-installer.log [INFO]: Driver not installed. [INFO]: Checking compiler version... [INFO]: gcc location: /usr/bin/gcc [INFO]: gcc version: gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1) [INFO]: Initializing menu [INFO]: Setup complete [INFO]: Components to install: [INFO]: Driver [INFO]: 465.19.01 [INFO]: Executing NVIDIA-Linux-x86_64-465.19.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1 [INFO]: Finished with code: 256 [ERROR]: Install of driver component failed. [ERROR]: Install of 465.19.01 failed, quitting
I found this other thread to a related problem that tells to look for /var/log/nvidia-installer.log
$ cat /var/log/nvidia-installer.log nvidia-installer log file '/var/log/nvidia-installer.log' creation time: Wed Jun 9 15:09:59 2021 installer version: 465.19.01 PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin nvidia-installer command line: ./nvidia-installer --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd Using built-in stream user interface -> Detected 4 CPUs online; setting concurrency level to 4. ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
The error led me a page about how to disable nvidia-drm - a potential solution, where I currently will have to check if I can get it working. I suspect I need to enter a TTY session while doing this.
I will trail off here as I will go explore installing CUDA on my machine. It sure is harder than I would have thought! Windows users have it easy :) I will come back and update with my findings so if I find a solution, maybe it will help someone else.