Nvidia-smi - No devices were found on CentOS-8.3 and 460.27.04

dieter.kasper · December 28, 2020, 6:47pm

I just exchanged my ‘GeForce GT 720’ with a KFA2 ‘GeForce GTX 1650 SUPER’ in my 5 years old AMD PC, but the new GPU is only visible in lspci. nvidia-smi results in ‘No devices were found’ and these dmesg messages:

[ 8595.773293] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[ 8595.773873] caller _nv000705rm+0x1af/0x200 [nvidia] mapping multiple BARs
[ 8603.948675] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0xffff:1290)
[ 8603.948812] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 8604.007154] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[ 8604.007769] caller _nv000705rm+0x1af/0x200 [nvidia] mapping multiple BARs
[ 8612.181198] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0xffff:1290)
[ 8612.181335] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
–nvidia-bug-report.log.gz (533.2 KB)
for details see the nvidia-bug-report.log.gz (attached)

generix · December 28, 2020, 7:22pm

Please upgrade your bios first to the latest version, it’s 6 years old. If that still doesn’t work, please check for pcie gen settings in bios and try to lower/higher them. Also, please try reseating the card in its slot.

dieter.kasper · December 28, 2020, 7:48pm

Thanks for the quick reply.

unfortunately the BIOS F2 from 11/25/2014 is the latest version for GA-78LMT-USB3
the BIOS does not offer any PCIe gen settings
and also reseating did not help
Any more ideas to get the ‘GeForce GTX 1650 super’ enabled for cuda on this PCIe-G2 based motherboard ?

aplattner · December 28, 2020, 8:49pm

If you have a spare drive, would it be possible to try a different Linux distribution?

dieter.kasper · December 28, 2020, 9:42pm

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
[root@otto ~]# uname -a
Linux otto 5.8.7-200.fc32.x86_64 #1 SMP Mon Sep 7 15:26:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

This is my ‘old’ system disk from Sept.2020 based on Fedora 32, before I switched to CentOS-8.3.
What is your conclusion ? What are the newer elements in the Linux-Kernel to make this happen ?

dieter.kasper · December 29, 2020, 9:54am

I switched back to my CentOS-8 disk and installed the latest kernel-ml from elrepo … and it worked.
Can you please track down or guess the root cause of this behaviour ?
So that Red Hat can back port the relevant changes between Kernel 4.18 and 5.8 … in order to enable their distribution as platform for AI workloads with NVIDIA GPUs.

[root@otto ~]# uname -a
Linux otto 5.10.3-1.el8.elrepo.x86_64 #1 SMP Wed Dec 23 13:25:00 EST 2020 x86_64 x86_64 x86_64 GNU/Linux

ilkka.tengvall_nv · January 12, 2021, 5:22pm

I hardly see switching kernel to out of distro kernel a solution. Rather a temporary workaround, I wish. I have similar forum posting about rhel, and was kindly guided here.

Did you ever find real root cause and fix for this? Sounds to me like kernel would have improved protection of overallocating PCI address space, and thus won’t allow driver to go forward. Could it be the reason?

dieter.kasper · January 12, 2021, 5:55pm

I agree, switching to an 5.10.3 elrepo Kernel is a temporary workaround, not a solution.

Unfortunately finding the real root cause is beyond my capabilities. I also did not find the time yet to open a bug report towards RHEL/CentOS 8.3. But, I will try this week the combination of Kernel 4.18.0-240.1.1 and Nvidia Driver 460.32.03

generix · January 12, 2021, 7:02pm

The “resource sanity check” message is just a symptom, very common. From observation, it’s always displayed when the nvidia driver resets the gpu.
driver can’t communicate with the gpu (RmInitAdapter failed) and resets it (resource message), in a loop.

aplattner · February 16, 2021, 7:29pm

Just to follow up here, Red Hat released an errata for this issue: RHSA-2021:0558

Topic		Replies	Views
Nvidia-smi: No devices were found Linux	1	341	July 24, 2024
Nvidia-smi No devices were found CentOS7.9 Tesla P100 General Topics and Other SDKs	0	674	August 9, 2023
390.42 + Centos7.4(3.10.0-693.21.1.el7.x86_64). nvidia-smi gives "No devices were found" Linux	8	3158	March 27, 2018
Nvidia-smi output: No devices were found Drivers - Linux, Windows, MacOS linux , driver , nvidia-smi , linux-driver-solutions	1	9422	March 1, 2022
No devices were found Linux nvidia-smi	5	2658	December 3, 2021
NVIDIA Driver Installed but nvidia-smi Shows "No Devices Found" , GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q+Ubuntu 24.04 + Kernel 6.8.0-52-generic Linux	1	223	February 7, 2025
nvidia-smi shows "No devices were found" Linux	6	6043	November 6, 2020
nvidia-smi : no devices were found. Linux	10	4203	April 2, 2020
No devices found Linux	3	1342	March 14, 2021
nvidia-smi "No devices were found" error CUDA Setup and Installation	23	62802	February 14, 2021

Nvidia-smi - No devices were found on CentOS-8.3 and 460.27.04

I just exchanged my ‘GeForce GT 720’ with a KFA2 ‘GeForce GTX 1650 SUPER’ in my 5 years old AMD PC, but the new GPU is only visible in lspci. nvidia-smi results in ‘No devices were found’ and these dmesg messages:

Related topics