NVIDIA-SMI communication error in Kali Linux in kernal 6.5.0-kali3-amd64

I got a new system (Lenovo ideapad Gaming 3) and installed kali linux in same.

Below are some command outputs

Kernal

└─$ uname -a
Linux Kali 6.5.0-kali3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1kali1 (2023-10-09) x86_64 GNU/Linux

Current OS Version

└─$ lsb_release -a
No LSB modules are available.
Distributor ID: Kali
Description:    Kali GNU/Linux Rolling
Release:        2023.4
Codename:       kali-rolling

Graphic card

└─$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation AD107M [GeForce RTX 4050 Max-Q / Mobile] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev 0a)

Below is output for nvidia-detect

└─$ nvidia-detect 
Detected NVIDIA GPUs:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD107M [GeForce RTX 4050 Max-Q / Mobile] [10de:28a1] (rev a1)

Checking card:  NVIDIA Corporation AD107M [GeForce RTX 4050 Max-Q / Mobile] (rev a1)
Uh oh. Failed to identify your Debian suite.

Below is output for nvidia-smi

└─$ nvidia-smi 
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

FYI Secure boot is disabled

└─$ sudo mokutil --sb-state 
SecureBoot disabled

I also tried reinstalling drivers multiple times. Still didn’t work. Below is the command i used to install

sudo apt install -y nvidia-driver nvidia-cuda-toolkit

nvidia-persistenced.service Status

└─$ sudo systemctl status nvidia-persistenced.service
○ nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; disabled; preset: disabled)
     Active: inactive (dead)

When ran nvidia-persistenced.service start

└─$ sudo systemctl start nvidia-persistenced.service
Job for nvidia-persistenced.service failed because the control process exited with error code.
See "systemctl status nvidia-persistenced.service" and "journalctl -xeu nvidia-persistenced.service" for details.

Below are few more logs related to same

└─$ systemctl status nvidia-persistenced.service
× nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2024-01-30 20:09:38 IST; 49s ago
    Process: 5373 ExecStart=/usr/bin/nvidia-persistenced --user nvpd (code=exited, status=1/FAILURE)
    Process: 5380 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
        CPU: 30ms

Jan 30 20:09:38 Kali systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Jan 30 20:09:38 Kali nvidia-persistenced[5374]: Started (5374)
Jan 30 20:09:38 Kali nvidia-persistenced[5374]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 133>
Jan 30 20:09:38 Kali nvidia-persistenced[5374]: Shutdown (5374)
Jan 30 20:09:38 Kali nvidia-persistenced[5373]: nvidia-persistenced failed to initialize. Check syslog for more details.
Jan 30 20:09:38 Kali systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Jan 30 20:09:38 Kali systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Jan 30 20:09:38 Kali systemd[1]: Failed to start nvidia-persistenced.service - NVIDIA Persistence Daemon.

GCC version in case needed

└─$ gcc --version
gcc (Debian 13.2.0-7) 13.2.0

Attaching nvidia bug report file
nvidia-bug-report.log.gz (123.5 KB)

Please run
sudo modprobe nvidia
and post any errors displayed.
also post the output of
dkms status

Thanks for quick response.

modprobe output

└─$ sudo modprobe nvidia
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/6.5.0-kali3-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:1084 command_do() Error running install command 'modprobe -i nvidia-current ' for module nvidia: retcode 1
modprobe: ERROR: could not insert 'nvidia': Invalid argument

dkms Output

└─$ dkms status
nvidia-current/525.147.05: added

Please reinstall kernel headers
sudo apt install --reinstall linux-headers-$(uname -r)
and post any errors and the output of
dkms status
afterwards.

1 Like

dkms Status

└─$ dkms status 
nvidia-current/525.147.05, 6.5.0-kali3-amd64, x86_64: installed

Now able to see output for nvidia-smi

└─$ nvidia-smi                                                                                                                                                                                                                              
Tue Jan 30 22:51:19 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P4    11W /  40W |      0MiB /  6141MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |

How did you able to come up with that step exactly ? Can i read about this troubleshooting steps or all the pre reqs to run it properly anywhere ? So that i don’t bother you or other community members before i can try all steps by myself.

Modprobing tells you whether no driver is there or it’s blacklisted or incompatible.
dkms status is either empty, meaning no driver istalled or “added” meaning driver is there but not compiled. Most common reason for that is missing headers. If not, reinstalling headers provokes a driver compile which is then giving you an error message with make.log for further analysis.

Thanks @generix That’s great info. Will read about them in detail.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.