CUDA NVIDIA Drivers For Ubuntu 22.04

I am using Ubuntu 22.0.1

$ uname -a
Linux 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ lspci -v | grep -i NV
05:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
Kernel driver in use: nvme
Kernel modules: nvme
cc:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Kernel modules: nvidiafb, nouveau
cd:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Kernel modules: nvidiafb, nouveau

$ nvidia-smi
Command ā€˜nvidia-smiā€™ not found, but can be installed with:
sudo apt install nvidia-utils-390 # version 390.157-0ubuntu0.22.04.2, or
sudo apt install nvidia-utils-418-server # version 418.226.00-0ubuntu5~0.22.04.1
sudo apt install nvidia-utils-450-server # version 450.248.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470 # version 470.256.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server # version 470.256.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-535 # version 535.183.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-535-server # version 535.183.06-0ubuntu0.22.04.1
sudo apt install nvidia-utils-550 # version 550.107.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-550-server # version 550.90.07-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510 # version 510.60.02-0ubuntu1
sudo apt install nvidia-utils-510-server # version 510.47.03-0ubuntu3
sudo apt install nvidia-utils-545 # version 545.29.06-0ubuntu0.22.04.2

Now, please can guide me to install correct nvidia drivers, as installing nvidia-utils-470/535/510, is not working

Thanks & Regards
Emb3

The K80 has Compute Capability 3.7 and the last driver to support it is the 470 series. If you need to install the Cuda Toolkit, 11.8 is the last to support it.

1 Like

But which should I install, among the below two
sudo apt install nvidia-utils-470 # version 470.256.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server # version 470.256.02-0ubuntu0.22.04.1

If try to install install ā€œsudo apt install nvidia-utils-470ā€, the I am getting the following error. And also, should I install the Toolkit first and then drivers or drivers first and then Toolkit.

Error
Error! Bad return status for module build on kernel: 6.8.0-45-generic (x86_64)
Consult /var/lib/dkms/nvidia/470.256.02/build/make.log for more information.
dpkg: error processing package nvidia-dkms-470 (ā€“configure):
installed nvidia-dkms-470 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-driver-470:
nvidia-driver-470 depends on nvidia-dkms-470 (<= 470.256.02-1); however:
Package nvidia-dkms-470 is not configured yet.
nvidia-driver-470 depends on nvidia-dkms-470 (>= 470.256.02); however:
Package nvidia-dkms-470 is not configured yet.

dpkg: error processing package nvidia-driver-470 (ā€“configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous fai
lure.
Processing triggers for initramfs-tools (0.140ubuntu13.4) ā€¦
update-initramfs: Generating /boot/initrd.img-6.8.0-45-generic
Errors were encountered while processing:
nvidia-dkms-470
nvidia-driver-470
E: Sub-process /usr/bin/dpkg returned an error code (1)

I have no experience using ubuntu packages. Your original post mentioned the nvidia-utils package, which I understand does not contain the driver and your machine appears to be currently using the nouveau driver, so I was suggesting which driver version you might like to install.

Looking at what you have just posted, it may be that you have attempted to install the driver and/or toolkit and perhaps have a broken installation, so may want to remove any nvidia packages and start again.

If you havenā€™t already, and you want the toolkit, read this, in particular section 3. A full toolkit install includes a driver.

thats definitely the right document to read but its also a lot to read. I am also not an obuntu user, but if your still having problems, perhaps this might help:

read the section about preinstallation actions and do the kernel development thing etc.

included in this would be: (you need to specify distro, version and architecture appropriate to your system - what these are is explained in the fileā€¦

sudo dpkg -i cuda-repo-.deb

then:
sudo apt-get install cuda-drivers-470

if it bleats about a conflict with other nvidia drivers remove them as per the document, try again

reboot.

you should be able to run nvidia-smi (forgive me if I am wrong but I believe it is installed with the driver)

If it can see the card and tell you anything about it (info that can be provided varies by card type). Also dont be confused by ā€œcuda versionā€ top right. It does not mean what you have, it means the latest supported version by that driver.

if all good,
sudo apt-get install cuda-toolkit

that will install everything without messing with the driver

AFAIK this method will help survive:
sudo apt update
sudo apt full-upgrade

Sorry, for late reply. I tried multiple times but I am unable install the drivers.

sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt install nvidia-driver-470 libnvidia-gl-470 libnvidia-compute-470 libnvidia-decode-470 libnvidia-encode-470 libnvidia-ifr1-470 libnvidia-fbc1-470
sudo dpkg --configure nvidia-dkms-470

Commands I ran, also I am attaching the crash reports can see tell anything.

Thank you for the help
Regards
Emb3
470_dkms_crash.txt (1.1 MB)
470_crash.txt (1.1 MB)

I am only guessing here, but both crash reports relate to a kernel header file.

If you look at the start of the install document I refered to above, the kernel versions mentioned do not match the version you seem to have installed, which could well be the issue.

Thanks for your help. I found that the kernel version and nvidia driver version are not compatible, so I installed an older kernel version and able to install nvidia driver. Now, is there to find a GPU compatible with Ubuntu and kernel version?

Thank you for the help
Regards
Emb3

I donā€™t quite understand what youā€™re saying here. If youā€™re asking if the K80 is compatible with Ubuntu 22.04 and the kernel, that shouldnā€™t be a problem. The important thing is that you have a 470 driver.

I apologize for any confusion in my previous communication. The issue I am facing is the inability to install a CUDA version that is compatible with the current NVIDIA driver, Ubuntu version, and kernel version. Given this challenge, I would like to request assistance in identifying an NVIDIA GPU that is fully compatible with Ubuntu 22.04 and Kernel 6.8.0.45-generic. The GPU should also support an NVIDIA driver version that is compatible with the system, as well as a corresponding CUDA version that works with both the driver and system versions.

Thank you very much for your support. I look forward to your guidance.

Best regards,
Emb3

So I take it from this, that you are now have the 470 driver running and can successfully use nvidia-smi?

If so, you should now be able to install Cuda 11.8 and it should work. I do wonder if something has become corrupted and you might be better off starting from a clean reinstall of everything.

If you do want to abandon the K80 and use another card, Ubuntu 22.04 appears to be supported from at least Cuda Toolkit 11.7 through to the current 12.6 on cards from what you have now, a Kepler 3.7 through to Hopper 9.0.

You can check here under the ā€œGPUā€™s Supportedā€ section to confirm which Cuda Toolkit is required for which hardware version, (Compute Capability).

Do make sure to check the installation documentation for the version you decide to install, to make sure you have the correct kernel and compiler versions installed.

Good luck, Iā€™m not sure I can offer any more of use.

Yes, I have successfully installed 470 driver for kernel and Ubuntu versions 5.15.0.25-generic and 22.04.5. But, when I try to install CUDA 11.8 or 11.4, the installation process is uninstalling the 470 drivers. So that is why, I am looking K80 alternative.

The purpose of the GPU card is to perform some operations on 10Gbps data acquired over PCIe express from Mellanox Connectx-4 Lx.

So, can you suggest any GPU for the above work, which is also a little cheap and work with Ubuntu and kernel versions or driver installation and CUDA installation

Thank you for the help
Regards,
Emb3

I canā€™t speak for package installs of the toolkit, but if you use a runfile version, you have the option of skipping the driver install.

If this does not appear during the interactive part of the installation, then running the installer with the ā€œā€“silent --toolkitā€ options. See here.

Sorry, I canā€™t offer card advice.

As the driver seems to be working, then according to the 11.8 installation notes (quoted below) and the only time I had to do this back then, if you want to try with your K80, what you want to use is to install:

cuda-toolkit-11-8

because that is meant to leave your driver alone. Whereas installing cuda or cuda-11-8 WILL mess with your driver. Of course if you want to buy a current card and use the latest cuda-toolkit, thats fine too. But like others I would not be able to spec what card might be needed for your case, as it seems specialised.
If it were me I would try with the k80 first. Then it will either do what you want, or at least give you a clue to how much more performance is needed.
Best of luck.

Table 4. Meta Packages Available for CUDA 11.8 Meta Package Purpose
cuda Installs all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when itā€™s released.
cuda-11-8 Installs all CUDA Toolkit and Driver packages. Remains at version 11.8 until an additional version of CUDA is installed.
cuda-toolkit-11-8 Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver.
cuda-tools-11-8 Installs all CUDA command line and visual tools.
cuda-runtime-11-8 Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages.
cuda-compiler-11-8 Installs all CUDA compiler packages.
cuda-libraries-11-8 Installs all runtime CUDA Library packages.
cuda-libraries-dev-11-8 Installs all development CUDA Library packages.
cuda-drivers Installs all Driver packages. Handles upgrading to the next version of the Driver packages when theyā€™re released.

For Linux Ubuntu, can tell the installation procedure (like commands/steps), to just install either CUDA 11.8 or 11.4, without disturbing the NVIDIA driver

Thank you for the help
Regards,
Emb3

I always use the package manager installation as I find upgrades etc a lot easier. This assumes you have set up a network repo etc as per the installation guideā€¦ BTW If you are to succeed with cuda (any version, any gpu) you really, really need to spend time on the provided documentation.

sudo apt-get install cuda-toolkit-11-8

for 11.8 the installation guide you want is here:

if package manager doesnā€™t work for you you may want to try the runfile version from that document.

Best of luck. I donā€™t think I can help further than this, though.

I have installed the cuda-11.8 using package manager, but it hasnā€™t installed the nvcc to run cuda codes. I am attaching the installation log.
cuda_11_8.log (41.0 KB)

Thank you for the help
Regards,
Emb3

Also, can you clarify that for K80 GPU other driver-470, are there any drivers compatible with K80 and Ubuntu 22.04

Regards,
Emb3

Ok so (forgive me if you have tried this already) when you type:
nvcc
at the command prompt do you get this:

nvcc: command not foundā€¦

if so what happens when you type:
/usr/local/cuda/bin/nvcc

do you get something like: (this is 12.6 nvcc I would expect 11.8 to say something similar)

nvcc fatal : No input files specified; use option --help for more information

If it looks like this then nvcc IS installed and should be working, and the way to fix this in section 13 from the cuda linux installation guide for 11.8. :

"13. Post-installation Actions

The post-installation actions must be manually performed. These actions are split into mandatory, recommended, and optional sections.
13.1. Mandatory Actions

Some actions must be taken after the installation before the CUDA Toolkit and Driver can be used.
13.1.1. Environment Setup

The PATH variable needs to include export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}. Nsight Compute has moved to /opt/nvidia/nsight-compute/ only in rpm/deb installation method. When using .run installer it is still located under /usr/local/cuda-11.8/.

To add this path to the PATH variable:

export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-11.8/lib64 on a 64-bit system, or /usr/local/cuda-11.8/lib on a 32-bit system

To change the environment variables for 64-bit operating systems:

export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

To change the environment variables for 32-bit operating systems:

export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Note that the above paths change when using a custom install path with the runfile installation method."

If you follow the correct steps for your system/installation method then if you type nvcc you should get the ā€œno input fileā€ response and all should be well.

If this doesnt work then I have no idea how you can not have nvcc if the install didnt say it had failed. If that is where you are, then as someone else has said your system would seem to be broken from the point of view of cuda installation. If it is you would probably need:

a clean system,
install the driver using the same method as worked for you before,
test using nvidia-smi,
If ok then install cuda-toolkit as before,

I am afraid I cant help further.

Thank you for the help, that resolved the issue is after installation of cuda-11.8, I forgot regarding post-installation instructions, exporting the path variable.

Regards,
Emb3