there are two Rtx 2080Ti GPU in my computer. the last status checked by nvidia-smi is 78 degree and fan 72%. when i came back ,i use nvidia-smi to check if the works finish ,then the output of nvidia-smi show fan ERR and voltage Err. i don’t know how to handle it, just reboot. now only one GPU is listed by nvidia-smi.
$ dmesg |grep NVRM
[ 1.442992] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 410.79 Thu Nov 15 10:41:04 CST 2018 (using threaded interrupts)
[ 10.870078] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
[ 10.870155] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 20.266093] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
[ 20.266111] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 71.727210] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
[ 71.727275] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 98.399103] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
[ 98.399156] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 171.055798] NVRM: RmInitAdapter failed! (0x26:0x65:1127)
[ 171.055817] NVRM: rm_init_adapter failed for device bearing minor number 0
Looks broken. Try reseating, check in another system, then RMA.