I bought 3 2080ti, but the GPUs disappeared sometimes

Please help, why???




nvidia-bug-report.log.gz (1.56 MB)

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

@generix I have uploaded the nvidia-bug-report.log.gz. Please help me to find out the reason of the GPU disappears sometimes, thank you very much!

Currently, the gpus continuously initialize and deinitialize again,sometimes failing to do so:

[78597.110274] NVRM: RmInitAdapter failed! (0x25:0x51:1103)
[78597.110308] NVRM: rm_init_adapter failed for device bearing minor number 2

Please enable the nvidia-persistenced to start on boot and check if that resolves the issue.

@generix I can’t restart the computer, so I try to enable nvidia-persistenced, did I succeed? the following is the info:

the following is the info in the nvidia-bug-report.log:

Apr 5 14:32:14 m2 nvidia-persistenced: Failed to create directory /var/run/nvidia-persistenced: Permission denied
Apr 5 14:32:14 m2 nvidia-persistenced: Unable to access /var/run/nvidia-persistenced: No such file or directory
Apr 5 14:32:14 m2 nvidia-persistenced: Shutdown (91114)
Apr 5 14:32:31 m2 nvidia-persistenced: Started (91309)
Apr 5 14:33:33 m2 nvidia-persistenced: Verbose syslog connection opened
Apr 5 14:33:33 m2 nvidia-persistenced: Directory /var/run/nvidia-persistenced will not be removed on exit
Apr 5 14:33:33 m2 nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
Apr 5 14:33:33 m2 nvidia-persistenced: Shutdown (92317)
Apr 5 14:37:46 m2 nvidia-persistenced: Verbose syslog connection opened
Apr 5 14:37:46 m2 nvidia-persistenced: Directory /var/run/nvidia-persistenced will not be removed on exit
Apr 5 14:37:46 m2 nvidia-persistenced: Failed to lock PID file: Resource temporarily unavailable
Apr 5 14:37:46 m2 nvidia-persistenced: Shutdown (94745)

the following is the info of process nvidia-persistenced

[root@m2 hs]# ps -aux | grep nvidia-persistenced
root 91309 0.1 0.0 8672 808 ? Ss 14:32 0:16 nvidia-persistenced
root 130267 0.0 0.0 112708 980 pts/0 S+ 17:43 0:00 grep --color=auto nvidia-persistenced

forget to tell you that the GPUs seem much normal, since the power of the three GPUs are all near 20 Watt

Looks like its running now but you won’t be able to tell if it’s properly running and resolving the issue until a reboot.

what’s the reason of this unnormal situation?

Hard to tell currently since running multiple nvidia gpus headless without persistenced is unsupported and can lead to all kinds of adverse effects.
So you first have to have a well-defined system by having persistenced running after a reboot and then start looking at what’s happening.

@generix thanks for your suggestion! I will Continuously watch the running situation of the GPUs, if there are any unnormal things, I will report to this topic.