Nvidia: No devices were found

Hi There
nvidia-bug-report.log.gz (133.2 KB)

I have getting this error “no devices were found” even after i have installed drivers.
When i do “ cat /proc/driver/nvidia/version” , i get
“NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 560.35.03 Release Build (dvs-builder@U16-I1-N07-12-3) Fri Aug 16 21:42:42 UTC 2024
GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)

Here is my bug report.

Thanks for the help!

Jeff

Hello @jeffrey.khng, welcome to the NVIDIA developer forums.

This is the critical part in your log:

  /var/log/dmesg:
[    3.745427] kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[    6.655750] kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  560.35.03  Release Build  (dvs-builder@U16-I1-N07-12-3)  Fri Aug 16 21:42:42 UTC 2024
[    6.688745] kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  560.35.03  Release Build  (dvs-builder@U16-I1-N07-12-3)  Fri Aug 16 21:22:33 UTC 2024
[    6.696942] kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    7.116923] kernel: NVRM: gpuWaitForGfwBootComplete_TU102: failed to wait for GFW_BOOT: (progress 0x1)
[    7.116948] kernel: NVRM: kgspWaitForGfwBootOk_TU102: failed to wait for GFW boot complete: 0x55 VBIOS version 94.06.32.00.22
[    7.116949] kernel: NVRM: kgspWaitForGfwBootOk_TU102: (the GPU may be in a bad state and may need to be reset)
[    7.116954] kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Not ready [NV_ERR_NOT_READY] (0x00000055) returned from kgspWaitForGfwBootOk_HAL(pGpu, pKernelGsp) @ kernel_gsp.c:3419
[    7.117017] kernel: NVRM: RmInitAdapter: Cannot initialize GSP firmware RM
[    7.118996] kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x55:1851)
[    7.120308] kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    7.121154] kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[    7.121537] kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[    7.225993] kernel: nvidia-uvm: Loaded the UVM driver, major device number 235.

It might be an incorrect installation of the GPU or an actual HW failure.

But you can try and install the non-open source driver for this GPU sinc eyou currently have the Open Source kernel module installed, which might cause issues as well.

Hi Markus
Let me try out your suggestion!

However, i am not optimistic, as i have some doubts if this GPU is working well too.

Thanks so much for your help though!

Jeff