I have been facing problems with getting my GPU running for quite some time now. Previously, everything was working well, but I suspect an update broke something.
I am on the 525.147.05 drivers installed from the debian repositories using the nvidia-driver package. First, when I ran the nvidia-smi command, it used to say No devices find, and I used to get the failed to allocate NvKmsKapi in the dmesg logs. Along with the RmInitAdapter thing.
Recently however, I get the following
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
Looking the the status of nvidia-persistenced, it says
Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 115 has read and write permis…
Checking /dev, it indeed doesn’t have those files.
I have tried older drivers, tried 535 from the backports repository, tried doing a clean reinstall of 525 drivers itself, all to no avail.
I am coming to this forum as sort of my last hope, since no one who has been kind enough to help me so far has been able to pinpoint what’s going on.
It seems that the NVIDIA module is not loaded. which makes me suspect that it didn’t build. Have you checked for the presence of the module(s) under: /lib/modules/$YOUR_KERNEL_VERSION/updates/dkms ?
If they’re not there, the most common cause is that your kernel headers are not installed.
I have disabled secure boot, which has changed the error I am now getting.
When running nvidia-smi, it says No devices found.
Running sudo dmesg | grep -i nvidia shows the following
Sorry, I totally forgot to update this thread!
So over the last week I installed windows, and then found out that my GPU might really be dead. Opening up Device Manager, it shows that the Nvidia GPU did not start because of some errors it reported. Neither did reinstalling the latest Nvidia drivers from their Nvidia’s website help.
The exact code reported was code 43, with error code shown as 0000002B. Not sure if those numbers mean much to anyone here, but yeah, that’s what I found.
Tried running nvflash, it’s complaining about no EEPROM being found or supported. Do I need the nouveua module installed and loaded during this process?