I’ve read through lots of similar posts and have installed the drivers multiple times and multiple ways. Software & Updates shows “Using NVIDIA drive metapackage from nvidia-driver-535 (proprietary, tested)”. lspci shows the card:
lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation IvyBridge GT2 [HD Graphics 4000] (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2803 (rev a1)
as does lshw:
sudo lshw -c video
*-display
description: VGA compatible controller
product: NVIDIA Corporation
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:16 memory:e6000000-e6ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:e7000000-e707ffff
[IvyBridge graphics deleted]
I’ve blacklisted nouveau. I’ve installed the drivers with and without a window manager running. Knowing as little as I do about this stuff there is probably something obvious in the output above that I’m missing. It’s also possible I just didn’t physically install the card correctly.
I took a look at dmesg where the only nvidia related error I saw involved nvidia-drm
[ +0.029114] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ +0.000089] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ +0.000195] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device
I attempted to attach the nvidia-bug-report.log.gz but the processing upload just kept spinning.
Any suggestions are greatly appreciated.
Processing: nvidia-bug-report.log.gz…
Similar here. I’m running Linux Mint 22 here. My use case for nVidia GPU is AI training and inference (not at a professional level); that’s why I use xx60 series and I also decided not to put the display load on the nVidia GPU and I’m using instead the built-in AMD graphics of the Ryzen. As it happened to jmiller2100, the device is detected, but there are errors while registering the device.
Any clue is welcome. TIA.
nvidia-bug-report.log.gz (209.2 KB)
I have the same issue for an AMD machine with 5080 GPU, Ubuntu24.04.
I tried several methods to install the driver, including from official documents, and from the ppa, but still got “No devices were found” with the same NvKmsKapiDevice from kernel log.
Any clue is welcome. Thanks!
nvidia-bug-report.log.gz (173.5 KB)
Apologies. I eventually got it to work and thought for quite a while that I should post an explanation of how I did it, as it was slightly tricky, but I didn’t.
I recall that I tried the version of the install fix that involved creating an alias for the c compiler (so that it would run a different version or with different flags) but that didn’t work. I also recall that the officially nvidia recommended version wouldn’t install, but I looked into the installer to see what version it was asking for and modified another installer (which one?) to use that version, and it worked.
Sorry. I’ll try to do better next time (and of course it might not even be relevant just as no one else’s solutions worked for me).
Don’t worry, @jmiller2100, there are a lot of people with similar problems, as you noticed. In fact, I replied in this thread because I was going to title mine just like yours (“Yet another nvidia-smi…”). :-)
When you mention “the officially nvidia recommended version”, do you mean the one that can be downloaded from the “official documents” linked by @terryxhx? I tried that using the CUDA-toolkit 12.8 and nvidia-open drivers, but I’m retrying right now and noticed some packages upgraded and other uninstalled. I will report back if I get it running.
Well, reinstalling has fixed the issue! I followed instructions outlined in the official documents page linked by @terryxhx, just with some minor differences that shouldn’t matter too much:
- As I had gone through that page before and just checked that I have cuda-toolkit 12.8 installed, I jumped straight to nvidia drivers installation.
- I chose nvidia-open packages. The process uninstalled the version 5.35 and installed 5.70 instead. This process also do several tasks involving kernels and DKMS, and the apt installing process let me know that some packages built for i386 were installed without being needed, so I removed them.
- Even though I had skipped cuda-toolkit installation, after installing the drivers I ran the cuda-toolkit install command, and it confirmed that everything was already installed.
- After rebooting, nvidia-smi reports my device (a 4060 Ti) correctly and I’ve been able to run a GPU CUDA-Test Python script (available at GitHub - ShanakaRG/GPU_CUDA-test: This python script can be used to test the CUDA installation with the python packages namely Pytorch, Tensorflow and Keras.).
- In the Installation Guide for Linux 12.8 documentation Post-installation actions section, please note that the sample export PATH command refers to 12.6 instead to 12.8. It was silly for me not to notice it, I know, but I put the wrong version in my path. :-)
- For what I’ve read in other threads, SecureBoot should be disabled (perhaps it is no longer needed, but I can confirm I have it disabled and it has worked for me). @terryxhx, your nvidia-bug-report.log.gz says you have it disabled, too.
I haven’t been able to build CUDA Samples repository, though, but it seems some kind of problem with _Floatxx definitions. As of now, I won’t spend time dealing with that.
I want to share my solution:
- Fresh install of Ubuntu 24.04, all updates installed (sudo apt update triggered a GUI windows prompting me to update various software components). This step is probably optional.
- Go into BIOS and disable Secure Boot. This is a potential security issue, so be cautious. For me this step was essential in order to install the drivers! I think that if it is enabled, signing the drivers is required, which I tried previously, but didnt work. The automatic driver install doesnt work with it either.
- Add the “Proprietary GPU Drivers” ppa (Proprietary GPU Drivers : “Graphics Drivers” team). Now it should appear in the “Software & Updates” App under the “Other Software” tab.
- Go to the “Additional Drivers” tab, select the “NVIDIA driver (open kernel) metapackage from nvidia-driver-570-open (proprietary)” option and click apply. It should take a while, after finishing the install I did a reboot.
- nvidia-smi works and shows correct info!
1 Like
Thank you @tom.nowak97 and welcome to the NVIDIA developer forums!