Instalation error cuda 12.2 in ubuntu 20.04

I try to install coda in Ubuntu 20.04.6 LTS focal following these instructions:
https://developer.nvidia.com/cuda-12-2-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

I have gcc 9.4.0 and
executed sudo apt-get install linux-headers-$(uname -r) before tried to install CUDA

The command line error is:

Setting up nvidia-dkms-535 (535.54.03-0ubuntu1) ...
update-initramfs: deferring update (trigger activated)

A modprobe blacklist file has been created at /etc/modprobe.d to prevent Nouveau
from loading. This can be reverted by deleting the following file:
/etc/modprobe.d/nvidia-graphics-drivers.conf

A new initrd image has also been created. To revert, please regenerate your
initrd by running the following command after deleting the modprobe.d file:
`/usr/sbin/initramfs -u`

*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can   ***
*** be loaded.                                                            ***
*****************************************************************************

INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
Removing old nvidia-535.54.03 DKMS files...

------------------------------
Deleting module version: 535.54.03
completely from the DKMS tree.
------------------------------
Done.
Loading new nvidia-535.54.03 DKMS files...
Building for 5.15.0-107-generic
Building for architecture x86_64
Building initial module for 5.15.0-107-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-535.0.crash'
Error! Bad return status for module build on kernel: 5.15.0-107-generic (x86_64)
Consult /var/lib/dkms/nvidia/535.54.03/build/make.log for more information.
dpkg: error processing package nvidia-dkms-535 (--configure):
 installed nvidia-dkms-535 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of cuda-drivers-535:
 cuda-drivers-535 depends on nvidia-dkms-535 (>= 535.54.03); however:
  Package nvidia-dkms-535 is not configured yet.

dpkg: error processing package cuda-drivers-535 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-drivers:
 cuda-drivers depends on cuda-drivers-535 (= 535.54.03-1); however:
  Package cuda-drivers-535 is not configured yet.

dpkg: error processing package cuda-drivers (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of nvidia-driver-535:
 nvidia-driver-535 depends on nvidia-dkms-535 (= 535.54.03-0ubuntu1); however:
  Package nvidia-dkms-535 is not configured yet.

dpkg: error processing package nvidia-driver-535 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-runtNo apport report written because the error message indicates it's a follow-up error from a previous failure.
                                                                                                                                                                        No apport report written because the error 
message indicates it's a follow-up error from a previous failure.
                                                                 No apport report written because MaxReports has already been reached
                                                                                                                                     No apport report written because MaxReports has already been reached
                                                                                                                                                                                                         No apport 
report written because MaxReports has already been reached
                                                          No apport report written because MaxReports has already been reached
                                                                                                                              No apport report written because MaxReports has already been reached
                                                                                                                                                                                                  ime-12-2:
 cuda-runtime-12-2 depends on cuda-drivers (>= 535.54.03); however:
  Package cuda-drivers is not configured yet.

dpkg: error processing package cuda-runtime-12-2 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-12-2:
 cuda-12-2 depends on cuda-runtime-12-2 (>= 12.2.0); however:
  Package cuda-runtime-12-2 is not configured yet.

dpkg: error processing package cuda-12-2 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda:
 cuda depends on cuda-12-2 (>= 12.2.0); however:
  Package cuda-12-2 is not configured yet.

dpkg: error processing package cuda (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-demo-suite-12-2:
 cuda-demo-suite-12-2 depends on cuda-runtime-12-2; however:
  Package cuda-runtime-12-2 is not configured yet.

dpkg: error processing package cuda-demo-suite-12-2 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.136ubuntu6.7) ...
update-initramfs: Generating /boot/initrd.img-5.15.0-107-generic
Errors were encountered while processing:
 nvidia-dkms-535
 cuda-drivers-535
 cuda-drivers
 nvidia-driver-535
 cuda-runtime-12-2
 cuda-12-2
 cuda
 cuda-demo-suite-12-2
E: Sub-process /usr/bin/dpkg returned an error code (1)

The crash files:
crash_files.zip (136.4 KB)

The logfile located in /var/lib/dkms/nvidia/535.54.03/build/make.log
make.log (1.1 MB)

I need help to solve it. Thanks

1 Like

I had almost the same bug on almost the same day, despite that I tried to install the CUDA Toolkit 11.7 (which is supported by nvidia-driver-515) on Ubuntu 20.04.
What’s ridiculous is that I actually succeeded once and made the CUDA 11.7 working well with the PyTorch, but I messed again with the NVIDIA driver and CUDA, and now I get the same error message when I install the cuda using sudo apt-get -y install cuda-toolkit-11-7 using the official .deb file.
It seems to involve a corrupted package (related to nvidia-dkms-515) that is neither fully installed nor can be fully removed.

2 Likes

Facing same issue while CUDA 11.6 installation on Ubuntu 20.04. Tried with nvidia-driver 510 and 535 issue remains same. Would appreciate the help.

1 Like

I just resolved this issue and would like to share my experience here. I have finally installed the suitable NVIDIA driver, CUDA toolkit, and PyTorch, and everything works fine (at least for now lollll). Hope this helps!

  1. What I did before the issue occurred
    I didn’t have enough knowledge about NVIDIA drivers and stuff, and was really messing with the NVIDIA drivers and CUDA toolkits. I have tried to install NVIDIA drivers of version: 515, 525, 535, 550, CUDA toolkit of version: 11.7, 11.8, 12.1, 12.4, and PyTorch of version: 1.13, 2.0, 2.1, 2.2, 2.3 (so no wonder I finally messed up my system lollll).

  2. To install the NVIDIA driver
    Check the NVIDIA driver status by the command:

    nvidia-smi
    

    This should print a table including the NVIDIA driver version and the supported CUDA version. For example, mine is Driver Version: 535.171.04 CUDA Version: 12.2 (on Ubuntu 20.04).
    If there are errors printing the info, try to reinstall the NVIDIA driver. The best way to do it is to go to the Ubuntu GUI: Software & Updates → Additional Drivers → [choose an NVIDIA driver version and click Apply Changes]).
    (There could also be some errors but there are plenty of posts to resolve them. Hope this is not an issue for you).

  3. To clear the error before installing CUDA
    CUDA 11.7 is a recommended version for NVIDIA driver 535.
    I first followed the process from the official website CUDA Toolkit 11.7 Update 1 Downloads but some steps there need to be modified and would actually affect the results.
    First, I removed all the unpacked CUDA packages in /var/ folder and also the package management info.
    I used the following command since I had multiple CUDA packages there:

    sudo rm -r /var/cuda-repo-*
    sudo rm /etc/apt/sources.list.d/cuda*
    

    THIS IS THE VERY STEP that clears the above error for me.
    Then, clean the apt caches using:

    sudo apt autoremove 
    sudo apt clean
    

    Now, reboot and check everything again:

    nvidia-smi
    sudo apt-get update
    sudo apt-get upgrade
    

    Hopefully, the NVIDIA driver is still working fine.

  4. To install the CUDA toolkit
    I followed the official steps but changed the last command to
    sudo apt-get -y install cuda-toolkit-11-7
    (refer to this post Problems loading nvidia drivers after cuda toolkit installation).
    Since I started over with the downloaded cuda-repo-xxx.deb file (for example, mine is cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb), the following commands are:

    sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
    sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-11-7
    

    Then, edit the environment variables in the ~/.bashrc file. Choose whatever text editor you like (I used gedit ~/.bashrc), and add the following lines to it:

    export PATH=$PATH:/usr/local/cuda-11.7/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.7/lib64
    

    and save the file and source it using source ~/.bashrc.
    Now, check the CUDA version using

    nvcc --version
    

    This should print something like:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Jun__8_16:49:14_PDT_2022
    Cuda compilation tools, release 11.7, V11.7.99
    Build cuda_11.7.r11.7/compiler.31442593_0

  5. (Additional) To install PyTorch that supports CUDA
    I would like to add this part because this was the beginning of my nightmare. I have to say the PyTorch official steps are quite unclear to me when I would like to install the PyTorch that supports an older version of CUDA (i.e., 11.7).
    In my case, only PyTorch <= 1.13 can work with my CUDA 11.7! So I finally found the commands on the official page here Installing previous versions of PyTorch.
    Since I am using conda, I used:

    # first, don't forget to use `conda activate xxx` to activate your env
    conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
    

    I didn’t install torchvision and torchaudio because I saw some posts saying that they could cause other errors so I just skipped them.
    Finally, to check if your PyTorch works with CUDA, simply run:

    python -c "import torch; print(torch.cuda.is_available())"
    

    which should return True.
    Moreover, here is another “fancier” Python script to check the PyTorch with CUDA:

    import torch
    print(f"PyTorch version: {torch.__version__}")  # Should print the installed PyTorch version
    print(f"CUDA available: {torch.cuda.is_available()}")  # Should print 'True'
    if torch.cuda.is_available():
        print(f"CUDA version: {torch.version.cuda}")  # Should print the CUDA version, e.g., '11.3' or '11.7'
        print(f"Device Name: {torch.cuda.get_device_name(0)}")  # Should print the name of your GPU, e.g., 'GeForce GTX 1050 Ti'
    else:
        print("CUDA is not available. Check your installation.")
    
1 Like

@xtificant Lifesaver!

I initially had some issues with the keyrings but apparently we can just ignore it and use “sudo apt-get -y install cuda-toolkit-11-7” at the last line, which would solve the issue.