CUDA installation fails in fresh Ubuntu 20.04

CUDA installation fails in fresh Ubuntu 20.04, perhaps since the installer tries to install latest Nvidia driver which was already in use and thus the installer cannot remove it?? Yet subsequentially running “apt --fix-broken install” fixes the installation ok!

How to produce:

  1. install Ubuntu 20.04 with the option of checking updates while installing
  2. after installation try update everything just to be sure, although this likely doesn’t matter
  3. install CUDA. I tried all three methods, all failed. Just to be sure, I formatted the disk and reinstalled Ubuntu between these attempts so that nothing remained from the previous attempts.

For example:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb

sudo apt-key add /var/cuda-repo-ubuntu2004-11-4-local/7fa2af80.pub

sudo apt-get update

sudo apt-get -y install cuda

With this, everything goes fine until

(removed 200+ lines of reporting things going smoothly)
Unpacking libnvidia-common-470 (470.57.02-0ubuntu1) over (470.57.02-0ubuntu0.20.04.1) …
dpkg: error processing archive /tmp/apt-dpkg-install-aJONtb/22-libnvidia-common-470_470.57.02-0ubuntu1_all.deb (–unpack):
trying to overwrite ‘/lib/firmware/nvidia/470.57.02/gsp.bin’, which is also in package nvidia-kernel-common-470 470.57.02-0ubuntu0.20.04.1
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)

(and after that, another 200 lines of ok messages)
E: Sub-process /usr/bin/dpkg returned an error code (1)

user@machine:~/installs$ nvidia-smi
Wed Aug 18 09:23:41 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:42:00.0 On | N/A |
| 34% 32C P8 8W / 75W | 186MiB / 3910MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1051 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 1677 G /usr/lib/xorg/Xorg 96MiB |
| 0 N/A N/A 1848 G /usr/bin/gnome-shell 41MiB |
| 0 N/A N/A 4294 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 5998 G /usr/lib/firefox/firefox 1MiB |
±----------------------------------------------------------------------------+

user@machine:~/installs$ sudo apt install mlocate (trying to install something else but getting a good hint)
Reading package lists… Done
Building dependency tree
Reading state information… Done
You might want to run ‘apt --fix-broken install’ to correct these.
The following packages have unmet dependencies:
libnvidia-gl-470 : Depends: libnvidia-common-470 (= 470.57.02-0ubuntu1) but 470.57.02-0ubuntu0.20.04.1 is to be installed
E: Unmet dependencies. Try ‘apt --fix-broken install’ with no packages (or specify a solution).

user@machine:~/installs$ sudo apt --fix-broken install
Reading package lists… Done
Building dependency tree
Reading state information… Done
Correcting dependencies… Done
The following packages were automatically installed and are no longer required:
chromium-codecs-ffmpeg-extra gstreamer1.0-vaapi libgstreamer-plugins-bad1.0-0 libva-wayland2
libx11-xcb1:i386
Use ‘sudo apt autoremove’ to remove them.
The following additional packages will be installed:
libnvidia-common-470
The following packages will be upgraded:
libnvidia-common-470
1 upgraded, 0 newly installed, 0 to remove and 274 not upgraded.
90 not fully installed or removed.
Need to get 0 B/16,5 MB of archives.
After this operation, 36,0 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 file:/var/cuda-repo-ubuntu2004-11-4-local libnvidia-common-470 470.57.02-0ubuntu1 [16,5 MB]
(Reading database … 198282 files and directories currently installed.)
Preparing to unpack …/libnvidia-common-470_470.57.02-0ubuntu1_all.deb …

(lots of lines going well and the “sudo apt --fix-broken install” apparently managed to install cudatookit, except that it didn’t update the paths, which in case of bash shell meant updating .bashrc with

#Add at the end of the file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda

After this everything seemed to work ok, except visual profiler which would be a topic of a different post.

Based on this experience, it would appear to be possible to update the installers so that everything is installed ok without needing to manually fix the installation.