Instalation error cuda 12.2 in ubuntu 20.04

I just resolved this issue and would like to share my experience here. I have finally installed the suitable NVIDIA driver, CUDA toolkit, and PyTorch, and everything works fine (at least for now lollll). Hope this helps!

  1. What I did before the issue occurred
    I didn’t have enough knowledge about NVIDIA drivers and stuff, and was really messing with the NVIDIA drivers and CUDA toolkits. I have tried to install NVIDIA drivers of version: 515, 525, 535, 550, CUDA toolkit of version: 11.7, 11.8, 12.1, 12.4, and PyTorch of version: 1.13, 2.0, 2.1, 2.2, 2.3 (so no wonder I finally messed up my system lollll).

  2. To install the NVIDIA driver
    Check the NVIDIA driver status by the command:

    nvidia-smi
    

    This should print a table including the NVIDIA driver version and the supported CUDA version. For example, mine is Driver Version: 535.171.04 CUDA Version: 12.2 (on Ubuntu 20.04).
    If there are errors printing the info, try to reinstall the NVIDIA driver. The best way to do it is to go to the Ubuntu GUI: Software & Updates → Additional Drivers → [choose an NVIDIA driver version and click Apply Changes]).
    (There could also be some errors but there are plenty of posts to resolve them. Hope this is not an issue for you).

  3. To clear the error before installing CUDA
    CUDA 11.7 is a recommended version for NVIDIA driver 535.
    I first followed the process from the official website CUDA Toolkit 11.7 Update 1 Downloads but some steps there need to be modified and would actually affect the results.
    First, I removed all the unpacked CUDA packages in /var/ folder and also the package management info.
    I used the following command since I had multiple CUDA packages there:

    sudo rm -r /var/cuda-repo-*
    sudo rm /etc/apt/sources.list.d/cuda*
    

    THIS IS THE VERY STEP that clears the above error for me.
    Then, clean the apt caches using:

    sudo apt autoremove 
    sudo apt clean
    

    Now, reboot and check everything again:

    nvidia-smi
    sudo apt-get update
    sudo apt-get upgrade
    

    Hopefully, the NVIDIA driver is still working fine.

  4. To install the CUDA toolkit
    I followed the official steps but changed the last command to
    sudo apt-get -y install cuda-toolkit-11-7
    (refer to this post Problems loading nvidia drivers after cuda toolkit installation).
    Since I started over with the downloaded cuda-repo-xxx.deb file (for example, mine is cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb), the following commands are:

    sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
    sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-11-7
    

    Then, edit the environment variables in the ~/.bashrc file. Choose whatever text editor you like (I used gedit ~/.bashrc), and add the following lines to it:

    export PATH=$PATH:/usr/local/cuda-11.7/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.7/lib64
    

    and save the file and source it using source ~/.bashrc.
    Now, check the CUDA version using

    nvcc --version
    

    This should print something like:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Jun__8_16:49:14_PDT_2022
    Cuda compilation tools, release 11.7, V11.7.99
    Build cuda_11.7.r11.7/compiler.31442593_0

  5. (Additional) To install PyTorch that supports CUDA
    I would like to add this part because this was the beginning of my nightmare. I have to say the PyTorch official steps are quite unclear to me when I would like to install the PyTorch that supports an older version of CUDA (i.e., 11.7).
    In my case, only PyTorch <= 1.13 can work with my CUDA 11.7! So I finally found the commands on the official page here Installing previous versions of PyTorch.
    Since I am using conda, I used:

    # first, don't forget to use `conda activate xxx` to activate your env
    conda install pytorch==1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
    

    I didn’t install torchvision and torchaudio because I saw some posts saying that they could cause other errors so I just skipped them.
    Finally, to check if your PyTorch works with CUDA, simply run:

    python -c "import torch; print(torch.cuda.is_available())"
    

    which should return True.
    Moreover, here is another “fancier” Python script to check the PyTorch with CUDA:

    import torch
    print(f"PyTorch version: {torch.__version__}")  # Should print the installed PyTorch version
    print(f"CUDA available: {torch.cuda.is_available()}")  # Should print 'True'
    if torch.cuda.is_available():
        print(f"CUDA version: {torch.version.cuda}")  # Should print the CUDA version, e.g., '11.3' or '11.7'
        print(f"Device Name: {torch.cuda.get_device_name(0)}")  # Should print the name of your GPU, e.g., 'GeForce GTX 1050 Ti'
    else:
        print("CUDA is not available. Check your installation.")
    
1 Like