Unable to determine the device handle for GPU 0000:89:00.0: Unknown Error

ningtiannan · May 19, 2022, 10:35am

I was use gpu in docker, and suddenly it crash, my docker container become unhealthy, and stop does’t make sence.
Please help me, help kids.

Unable to determine the device handle for GPU 0000:89:00.0: Unknown Error
root@agent-192-168-1-65:~# lspci | grep NV | grep VGA
1a:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
1b:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
3d:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
3e:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
88:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev ff)
b1:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
b2:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
root@agent-192-168-1-65:~# nvidia-smi -i 0000:89:00.0 -pm 0
Unable to determine the device handle for GPU 0000:89:00.0: Unknown Error
root@agent-192-168-1-65:~# nvidia-smi drain -p 0000:89:00.0 -m 1
Successfully set GPU 00000000:89:00.0 drain state to: draining.

nvidia-bug-report.log.gz (3.7 MB)

ningtiannan · May 19, 2022, 10:47am

Found Xid 79 in nvidia-bug-report.log

[2406993.511530] NVRM: GPU at PCI:0000:89:00: GPU-e2b5c237-4750-e614-2c55-cb411e38d637
[2406993.511533] NVRM: Xid (PCI:0000:89:00): 79, pid=2051, GPU has fallen off the bus.
[2406993.511535] NVRM: GPU 0000:89:00.0: GPU has fallen off the bus.
[2406993.511560] NVRM: Xid (PCI:0000:89:00): 79, pid=2054, GPU has fallen off the bus.
[2406993.511562] NVRM: GPU 0000:89:00.0: GPU has fallen off the bus.

generix · May 21, 2022, 10:55am

Might be overheating or lack of power. Please check airflow, monitor temperatures. Check/swap power cords. Since it’s always the same gpu, it might also be broken.

Topic		Replies	Views
Unable to determine the device handle for GPU0000:06:00.0: Unknown Error CUDA Setup and Installation ubuntu	2	345	April 15, 2024
Unable to determine the device handle for GPU 0000:0B:00.0: Unknown Error, while using GPU in docker Linux	2	1104	March 24, 2022
Unable to determine the device handle for GPU0000:05:00.0: Unknown Error Linux	0	190	October 31, 2024
Unable to determine the device handle for GPU 0000:85:00.0: Unknown Error //GPU has fallen off the bus Linux linux	6	700	November 9, 2023
Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error Linux	1	1062	September 15, 2022
Unable to determine the device handle for GPU0000:65:00.0: Unknown Error Linux ubuntu	1	1941	March 1, 2023
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error Drivers - Linux, Windows, MacOS	1	253	September 14, 2024
Unable to determine the device handle for GPU 0000:49:00.0: Unknown Error And NVRM: GPU 0000:49:00.0: RmInitAdapter failed! (0x26:0xffff:1290) Linux	1	490	November 23, 2021
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error Linux nvidia-smi	2	5069	November 9, 2022
Unable to determine the device handle for GPU0000:18:00.0: Unknown Error Linux	0	789	May 27, 2023

Unable to determine the device handle for GPU 0000:89:00.0: Unknown Error

Related topics