RTX 4090 on Fedora 41 - Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

hi Team , I am having an issue with RTX4090 and Fedora41. It was working fine since implementation until during an embedding model work for document inference, went into issues as below, fan speed pretty high, but overall temp didn’t exceed 70C:

root@soundwave:~# nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error
root@soundwave:~# nvidia-debugdump --dumpall
ERROR: GetCaptureBufferSize failed, GPU is lost, bufSize: 0x0
ERROR: internal_getDumpBuffer failed, return code: 0xf
ERROR: internal_dumpSystemComponent() failed, return code: 0xf
ERROR: GetCaptureBufferSize failed, GPU is lost, bufSize: 0x0
ERROR: internal_getDumpBuffer failed, return code: 0xf
ERROR: internal_dumpSystemComponent() failed, return code: 0xf
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7

/etc/modprobe.d# cat nvidia.conf
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1 fbdev=1
nvidia-bug-report.log.gz (608.3 KB)

rteixeira@soundwave:~$ nvidia-ctk cdi generate --device-name-strategy=uuid --output cdi-spec.yaml
INFO[0000] Using /usr/lib64/libnvidia-ml.so.565.77
INFO[0000] Using /usr/lib64/libnvidia-sandboxutils.so.565.77
INFO[0000] Auto-detected mode as ‘nvml’
INFO[0000] Using driver version 565.77
WARN[0000] Could not locate /dev/nvidia-modeset: pattern /dev/nvidia-modeset not found
INFO[0000] Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools
INFO[0000] Selecting /dev/nvidia-uvm as /dev/nvidia-uvm
INFO[0000] Selecting /dev/nvidiactl as /dev/nvidiactl
INFO[0000] Selecting /usr/lib64/libnvidia-egl-gbm.so.1.1.2 as /usr/lib64/libnvidia-egl-gbm.so.1.1.2
INFO[0000] Selecting /usr/lib64/libnvidia-egl-wayland.so.1.1.17 as /usr/lib64/libnvidia-egl-wayland.so.1.1.17
INFO[0000] Selecting /usr/lib64/libnvidia-allocator.so.565.77 as /usr/lib64/libnvidia-allocator.so.565.77
WARN[0000] Could not locate libnvidia-vulkan-producer.so.565.77: pattern libnvidia-vulkan-producer.so.565.77 not found
libnvidia-vulkan-producer.so.565.77: not found
INFO[0000] Selecting /usr/lib64/xorg/modules/drivers/nvidia_drv.so as /usr/lib64/xorg/modules/drivers/nvidia_drv.so
INFO[0000] Selecting /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.565.77 as /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.565.77
INFO[0000] Selecting /usr/share/glvnd/egl_vendor.d/10_nvidia.json as /usr/share/glvnd/egl_vendor.d/10_nvidia.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json as /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
INFO[0000] Selecting /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json as /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
INFO[0000] Selecting /usr/share/nvidia/nvoptix.bin as /usr/share/nvidia/nvoptix.bin
WARN[0000] Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found
INFO[0000] Selecting /usr/share/X11/xorg.conf.d/nvidia-drm-outputclass.conf as /usr/share/X11/xorg.conf.d/nvidia-drm-outputclass.conf
INFO[0000] Selecting /etc/vulkan/icd.d/nvidia_icd.json as /etc/vulkan/icd.d/nvidia_icd.json
WARN[0000] Could not locate vulkan/icd.d/nvidia_layers.json: pattern vulkan/icd.d/nvidia_layers.json not found
pattern vulkan/icd.d/nvidia_layers.json not found
INFO[0000] Selecting /etc/vulkan/implicit_layer.d/nvidia_layers.json as /etc/vulkan/implicit_layer.d/nvidia_layers.json
INFO[0000] Selecting /usr/lib64/libEGL_nvidia.so.565.77 as /usr/lib64/libEGL_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLESv1_CM_nvidia.so.565.77 as /usr/lib64/libGLESv1_CM_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLESv2_nvidia.so.565.77 as /usr/lib64/libGLESv2_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libGLX_nvidia.so.565.77 as /usr/lib64/libGLX_nvidia.so.565.77
INFO[0000] Selecting /usr/lib64/libcuda.so.565.77 as /usr/lib64/libcuda.so.565.77
INFO[0000] Selecting /usr/lib64/libcudadebugger.so.565.77 as /usr/lib64/libcudadebugger.so.565.77
INFO[0000] Selecting /usr/lib64/libnvcuvid.so.565.77 as /usr/lib64/libnvcuvid.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-allocator.so.565.77 as /usr/lib64/libnvidia-allocator.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-cfg.so.565.77 as /usr/lib64/libnvidia-cfg.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-eglcore.so.565.77 as /usr/lib64/libnvidia-eglcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-encode.so.565.77 as /usr/lib64/libnvidia-encode.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-fbc.so.565.77 as /usr/lib64/libnvidia-fbc.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glcore.so.565.77 as /usr/lib64/libnvidia-glcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glsi.so.565.77 as /usr/lib64/libnvidia-glsi.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-glvkspirv.so.565.77 as /usr/lib64/libnvidia-glvkspirv.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gpucomp.so.565.77 as /usr/lib64/libnvidia-gpucomp.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gtk2.so.565.77 as /usr/lib64/libnvidia-gtk2.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-gtk3.so.565.77 as /usr/lib64/libnvidia-gtk3.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ml.so.565.77 as /usr/lib64/libnvidia-ml.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ngx.so.565.77 as /usr/lib64/libnvidia-ngx.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-nvvm.so.565.77 as /usr/lib64/libnvidia-nvvm.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-opencl.so.565.77 as /usr/lib64/libnvidia-opencl.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-opticalflow.so.565.77 as /usr/lib64/libnvidia-opticalflow.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-pkcs11-openssl3.so.565.77 as /usr/lib64/libnvidia-pkcs11-openssl3.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-pkcs11.so.565.77 as /usr/lib64/libnvidia-pkcs11.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-ptxjitcompiler.so.565.77 as /usr/lib64/libnvidia-ptxjitcompiler.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-rtcore.so.565.77 as /usr/lib64/libnvidia-rtcore.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-sandboxutils.so.565.77 as /usr/lib64/libnvidia-sandboxutils.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-tls.so.565.77 as /usr/lib64/libnvidia-tls.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-vksc-core.so.565.77 as /usr/lib64/libnvidia-vksc-core.so.565.77
INFO[0000] Selecting /usr/lib64/libnvidia-wayland-client.so.565.77 as /usr/lib64/libnvidia-wayland-client.so.565.77
INFO[0000] Selecting /usr/lib64/libnvoptix.so.565.77 as /usr/lib64/libnvoptix.so.565.77
INFO[0000] Selecting /usr/lib64/vdpau/libvdpau_nvidia.so.565.77 as /usr/lib64/vdpau/libvdpau_nvidia.so.565.77
WARN[0000] Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found
WARN[0000] Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found
WARN[0000] Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found
INFO[0000] Selecting /lib/firmware/nvidia/565.77/gsp_ga10x.bin as /lib/firmware/nvidia/565.77/gsp_ga10x.bin
INFO[0000] Selecting /lib/firmware/nvidia/565.77/gsp_tu10x.bin as /lib/firmware/nvidia/565.77/gsp_tu10x.bin
INFO[0000] Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi
INFO[0000] Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump
INFO[0000] Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-control as /usr/bin/nvidia-cuda-mps-control
INFO[0000] Selecting /usr/bin/nvidia-cuda-mps-server as /usr/bin/nvidia-cuda-mps-server
INFO[0000] Generated CDI spec with version 0.8.0