Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error

When I typed command nvidia-smi , Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error was returned.

I then typednvidia-debugdump --list, here is the result:

Found 2 NVIDIA devices
        Device ID:              0
        Device name:            NVIDIA TITAN X (Pascal)   (*PrimaryCard)
        GPU internal ID:        0324416077500

detailed info of bug report:
nvidia-bug-report.log (2.2 MB)

I don’t know how to approach this problem, so I am asking for help.

OS: Linux version 4.15.0-142-generic
GPU: 2*NVIDIA TITAN X

1 Like

nvidia-bug-report.log (3.5 MB)
Hello, I have the same problem. My Nvidia A2000 is not working with what I believe is the latest driver (520.56.06). I have a linux kernel 5.15.0-53 with generic headers…

On my side, the nvidia-debugdump --list says the following :

~$ nvidia-debugdump --list
Found 1 NVIDIA devices
Error: nvmlDeviceGetHandleByIndex(): Not Found
FAILED to get details on GPU (0x0): Not Found

Also, I have this output for nvidia-smi :

~$ nvidia-smi -L
Unable to determine the device handle for gpu 0000:01:00.0: Not Found

Hello again, I kept scrapping the forums and I think you can check this : Nvidia-smi outputs “No devices were found” on Ubuntu 22.04 + driver 520 - #2 by generix

On my side I changed the drivers to a non “open kernel” version and restarted my machine. The nvidia-smi works again and I can use tools such as gpustat

Hope this helps !

1 Like
NVRM: Xid (PCI:0000:02:00): 79, pid=1160, GPU has fallen off the bus.

[15028104.848929] pcieport 0000:00:02.0: AER: Multiple Corrected error received: id=0010
[15028104.848952] pcieport 0000:00:02.0: can't find device of ID0010
[15028104.848955] pcieport 0000:00:02.0: AER: Corrected error received: id=0010
[15028104.848961] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Receiver ID)
[15028104.848966] pcieport 0000:00:02.0:   device [8086:2f04] error status/mask=00000040/00002000
[15028104.848972] pcieport 0000:00:02.0:    [ 6] Bad TLP       

Please reboot. If the gpu still doesn’t show up, it’s probably broken, check if it works in another system.

I’ve tried that as below:

$ sudo apt purge -y --allow-change-held-packages nvidia-driver-550-server-open
$ sudo apt autoremove -y
$ sudo apt update
$ sudo apt install -y nvidia-driver-550-server
$ sudo reboot

And for now, error was solved.

@theobtime13

I’ve installed non-open GPU driver, but I got Unknown Error again…

$ dpkg -l | grep nvidia-driver
ii  nvidia-driver-550-server                  550.163.01-0ubuntu0.22.04.1             amd64        NVIDIA Server Driver metapackage
$ nvidia-smi
Unable to determine the device handle for GPU0000:C3:00.0: Unknown Error

Hello, it’s been a loooong time since this problem occurred and honestly I don’t remember exactly what solved my problem.

I personally have the driver nvidia-driver-520 which is tagged “transitional package”, not sure how relevant this is but that’s the main difference between your install and mine.

$ dpkg -l | grep nvidia-driver
ii  nvidia-driver-520                                       525.147.05-0ubuntu2.22.04.1                          amd64        Transitional package for nvidia-driver-535
ii  nvidia-driver-535                                       535.230.02-0ubuntu0.22.04.1                          amd64        NVIDIA driver metapackage
$ nvidia-smi -L
GPU 0: NVIDIA RTX A2000 8GB Laptop GPU (UUID: GPU-34ee1d81-c06b-59b3-686d-5542686314cc)