No devices were found

hello, i have a problem about program “nvidia-smi”。when i exec nvidia-smi,it reported “No devices were found”。
and i sometimes reported just one of my GPUs infomation
my environment:
OS: Ubuntu 18.04 LTS Server
Kernel: 4.15.0-163-generic
GPU: RTX3080 * 2

this my log:
nvidia-bug-report.log.gz (468.4 KB)

Hi @user76356 , thanks for bringing this up.

Checking your report log I can see repetition of this messsage:
[ 2143.889863] NVRM: GPU 0000:65:00.0: rm_init_adapter failed, device minor number 0
[ 2154.838560] NVRM: GPU 0000:65:00.0: RmInitAdapter failed! (0x23:0xffff:1204)

Looking through the Linux forums (where I moved the topic as well) I can see a lot threads showing different possible reasons for these initialization failures.

One possible solution is of course to update to the latest driver 470.86, following the installation instructions very closely.

And one of the more common possible solutions is to enable the nvidia persistence daemon:

If neither of those help, I am sure we can find additional suggestions for you.

1 Like

Since this is only happening after about 40 minutes and they’re built into a server running headless, please start by properly setting up nvidia-persistenced to start on boot and make sure it’s continuously running.
If the issue is still occuring, I suspect an airflow issue, thus the gpus are overheating. Please monitor temperatures.
The gpus likely are blocking airflow/heating up each other if they’re in neighbouring slots since those are consumer type cards. Please check.


thanks,i tried to control temperature,and things get better。BTW, is there some tools can help me read “nvidia-bug-report.log.gz”?

1 Like

thanks, i tried to control temperature,and things get better

The bug report is simply a text log file that is zipped up to make it smaller and easier to share.

You can unzip it with any ZIP tool like 7-Zip on Windows or unzip or gzip on Linux.