Nvidia-smi shows no devices were found although driver is installed


Hello forum,
I have installed nvidia drivers and dpkg shows the version but nvidia-smi show no devices found. Is there any problem? How to resolve this?

Attached image of the same.

any suggesion?

Hi again,

Please run sudo nvidia-bug-report.sh from a shell and attach the resultingnvidia-bug-report.log.gz file to this thread.

Thanks.

data (863.9 KB)
Hi,

This is the file.
Thanks for your help.

You used the open kernel module installation option. This does not work on a GeForce GPU as is.

May 22 10:33:57 ryzen kernel: [    6.397361] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
May 22 10:33:57 ryzen kernel: [    6.397363] NVRM: To force use of Open nvidia.ko on other GPUs, see the
May 22 10:33:57 ryzen kernel: [    6.397364] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described
May 22 10:33:57 ryzen kernel: [    6.397364] NVRM: in the README.
May 22 10:33:57 ryzen kernel: [    6.757101] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x62:0x0:1849)
May 22 10:33:57 ryzen kernel: [    6.757565] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
May 22 10:33:57 ryzen kernel: [    6.757635] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NvKmsKapiDevice
May 22 10:33:57 ryzen kernel: [    6.757756] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to register device

using the workaround is not recommended. Please start over with the proprietary driver.



I have attached two images here. One is my simulation requirement and one is list of all available drivers for my system. Can you please suggest me which one should I installed as per my requirement.

Do I need to run purge again to uninstall all NVidia drivers?

Thank you so much for your help.

Please use the 525 - distro non-free driver.

And yes, to be safe, please purge the 530 driver first, without the GUI running, to make sure the Open Source kernel module is removed.

Hi,
Pardon me to ask this but when you suggest without running GUI means not to use ā€œsoftare & updatesā€ to install driver and use command-line
OR
Go to advanced recovery options and run as root user to install the driver?

Thanks once again and apology to ask this if it sound silly question to you.

I have done.

Thanks you for your help

1 Like

There are no silly questions. Especially not when talking about Linux :-)

ā€œWithout GUIā€ means without a window manager so that you can be sure all parts of the NVIDIA driver are un-loaded before you remove/replace them.

But it seems you managed to resolve your issue now?

Great news!

nvidia-bug-report.log.gz (1.1 MB)

Hi Markus, I think Iā€™m having similar issues, and I figured I might get a quick answer from jumping in this thread.

Iā€™m currently following very closely the readme
http://us.download.nvidia.com/XFree86/Linux-x86_64/535.113.01/README/minimumrequirements.html
for a manual installation.

although the 535-server-open package is recommended when I run ubuntu-drivers devices, I keep having issues, mainly that I cannot launch any additional displays, so far, using the ā€œadditional driversā€ GUI to install packages.

Still going to try the 535 (non-server, non-open) package. I would like to be able to run dGPU for Tensorflow, eventually, which requires an ā€œ-openā€ configuration, according to the CUDA docs. However, as I have a weird hardware combination (AMD Ryzen R7 5800H CPU, RTX 3070 laptop) on my HP Omen 15, there may be some xorg.conf ā€œmanual tweakingā€ re: section 2.8

Thanks for your time and attention,

My next steps: try to use GUI to install: 535-open, 535, and 535-server, in that order. then retry via sudo install ubuntu-drivers I have dkms installed, and it looks like the 535 should work alright, but it appears that manual installation for Tensorflow is eventually required.

Iā€™m skeptical of .run files because they seem more complicated to undo, and Iā€™m trying as many different paths as I can.

Hi there @mikethechang,

On your setup you should not install the open version of the drivers. They are not officially supported on GeForce consumer GPUs and will cause you setup trouble.

To run CUDA with Tensorflow you also do not need to install open kernel modules. I am not sure where you read that. If you find it again please share and I will try to have that fixed or explained.

Even better than handling the driver installation yourself, you can actually follow CUDA installation instructions and have the CUDA installation also install the driver in the same process.

Also sudo install ubuntu-drivers is not recommended, especially with DKMS it is known to cause issues.

Lastly about .run files, they are much easier than you think. If you consistently only use them with NVIDIA drivers, then the uninstall option will actually work fine.

1 Like

Hi Markus,

Thanks for your response. I am certainly running into dkms errors. as updates come in, now my nvidia-smi returns NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. but the 535-proprietary driver is clearly installed via ā€˜Additional Driversā€™ from the 20.04 menu.

So now I need to reinstall. again. because the external monitor management stopped working. I may try the .run files if I cannot figure out the best way to install with dkms. there are other packages i have e.g. librealsense which benefit from dkms.

If I understand you correctly:

ā€¦ there is NO reason I need to actually install the nvidia-535 driver of any sort?? if so thatā€™s great news, and I can stop redundant (or worse counterproductive) installation steps.

re: dGPU maybe I actually meant GPUdirect:

e.g. here it says: "Starting with CUDA toolkit 12.2.2, GDS kernel driver package nvidia-gds version 12.2.2-1 (provided by nvidia-fs-dkms 2.17.5-1) and above is only supported with the NVIDIA open kernel driver. " based on my difficulties with installation Iā€™m not trying that now.

tl;dr Iā€™ll try the CUDA installation instructions, then .run files then if I hear back about dkms-friendly installation paths, Iā€™ll do that.

the most essential thing is to install driver, CUDA 11.8, cuDNN, and Tensorflow in a way thatā€™s stable/robust to other updates. to that end, maybe you can tell me what the update procedure is, if I /do/ happen to use the runfiles. I canā€™t seem to find those.

nvidia-bug-report.log.gz (156.9 KB)

the .run file actually worked great. thank you, and has remained stable after multiple sudo apt updates.

bonus: found and cleaned up a bunch of conflicting headers from dkms and the sudo install ubuntu-drivers in the process. thank you!

1 Like

Great to hear that!

Thanks for the update.

same issue. using esxi 8 with RTX 3060 ti in passtrough, ubuntu 22.04
nvidia-bug-report.log.gz (102.9 KB)