NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Can I manually install kernel 5.8 ?
How to not create an xorg.conf file?

You can also revert anything you did

  • uninstall the runfile driver
  • delete /etc/X11/xorg.conf
  • remove all kernels you manually installed
    and then update ubuntu. You should end up at kernel 5.8 automatically.

It did work perfectly fine. But after 2 or 3 reboots, the problem has returned. I am unable to boot again. If I boot by removing quiet splash from kernel parameters and set gfxmode to text, it displays ā€œfailed to start load/save screen backlight brightness of acpi_video0ā€ and then continues and after all messages it gets stuck in a black screen with underscore(non-blinking and Ctrl+Alt+F1 do nothing) at top left corner. Should I share my /var/log files and nvidia-bug-report.log.gz
Also thanks for fast reply.

try:

sudo apt install --install-recommends linux-generic-hwe-20.04 

Hello,
I have same problemā€¦

ā€œNVIDIA-SMI has failed because it couldnā€™t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ā€

I use ubuntu 20.04 with GeForce RTXā„¢ 3060 Laptop GPU.

$ uname -a
Linux xxx 5.8.0-44-generic #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I installed nvidia-driver-460.

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ ubuntu-drivers devices
$ sudo apt install nvidia-driver-460

After reboot,

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Here is nvidia-bug-report.log.gz (194.7 KB) .

Thank you in advance.

Please set the kernel parameter
pci=realloc

1 Like

I editted GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub.

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc"

and

sudo update-grub
sudo reboot

After that, nvidia-smi worked fine.
Thank you very much!!

1 Like

I am atggreg012. It did work perfectly fine. But after 2 or 3 reboots, the problem has returned. I am unable to boot again. If I boot by removing quiet splash from kernel parameters and set gfxmode to text, it displays ā€œfailed to start load/save screen backlight brightness of acpi_video0ā€ and then continues and after all messages it gets stuck in a black screen with underscore(non-blinking and Ctrl+Alt+F1 do nothing) at top left corner. Should I share my /var/log files and nvidia-bug-report.log.gz
Also thanks for fast reply.

Please open a new thread to avoid not being able to post and attch a new nvidia-bug-report.log.

I did the following as you said:

  • uninstall the runfile driver
  • delete /etc/X11/xorg.conf
  • remove all kernels you manually installed
    and then update ubuntu. You should end up at kernel 5.8 automatically.
    It did work perfectly fine. But after 2 or 3 reboots, I am unable to boot again. If I boot by removing quiet splash from kernel parameters and set gfxmode to text, it displays ā€œfailed to start load/save screen backlight brightness of acpi_video0ā€ and then continues and after all messages it gets stuck in a black screen with underscore(non-blinking and Ctrl+Alt+F1 do nothing) at top left corner.
    nvidia-bug-report.log.gz (265.9 KB)

This boot you booted to kernel 5.11 with ā€˜nomodesetā€™ kernel parameter set (might be due to recovery mode), before that, you booted into the 5.4 kernel (which obviously doesnā€™t work. Please make sure you boot into the 5.11 kernel and donā€™t have ā€˜nomodesetā€™ set.

I was in recovery mode to create nvidia-bug-report.log.gz. I do not have nomodeset kernel parameter set otherwise. It normally launches with ubuntu spinning circle appearing for a moment and then the screen freezes to ubuntu and asus logo.
I have issue with kernel 5.11 and 5.8 not booting ( which were working perfectly before 3 reboots ). Other kernels I have are 5.4 and 5.7 which are booting fine but have issue with nvidia-smi and nvidia-setting.

After this post I probably wonā€™t be able to reply back due to limit. How shall I reply? Can I somehow continue the progress if I create new thread.

Here I am posting nvidia-bug-report.log file while in kernel 5.4 in normal mode.nvidia-bug-report.log.gz (132.8 KB)

Please delete /etc/X11/xorg.conf and boot into kernel 5.11.

I have the same issue that were originally reported in this thread. ā€œNVIDIA-SMI has failed because it couldnā€™t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ā€

Ubuntu release: 20.10
Kernel: 5.8.0-44-generic
Nvidia driver installed: 460

Computer: Asus ROG zephyrus with Nvidia 1660 TI

Problem is that everything worked perfectly and I could switch between AMD GPU and Nvidia GPU. Suddenly it stopped working, and now I canā€™t seee the nvidia card when listing pci devices using lspci.

  1. I have tried booting the computer with ubuntu 20.04 and that works as normal. The nvidia card is listed in lspci.
  2. I have tried to set pci=realloc

It seems to me that somehow the system fails to detect the nvidia card. Any suggestions or is a full reinstall of linux the only remedy? Btw; I am pretty new to Linux and Iā€™m trying to learn - so please go easy on me.

Please check for an udev rule that removes the nvidia card:
grep 10de /lib/udev/rules.d/*
and remove it.

I found a .rules for nvidia that referred to 10de. I removed that file and rebooted. The rule file is still gone after reboot, but the problem persists, Iā€™m afraid.

You might have to update the initrd:
sudo update-initramfs -u
and reboot.
If that still doesnā€™t help, please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Hi,

Didnā€™t seem to do the trick, Iā€™m afraid. I have uploaded the full bug report.

nvidia-bug-report.log.gz (113.9 KB)

Thanks!


Update:

This is the reply to below request (Sorry, Iā€™m only allowed 3 responses it seems):

Output of dpkg -l |grep ubuntu-drivers-common:
ii ubuntu-drivers-common 1:0.8.6.3~0.20.10.1 amd64 Detect and install additional Ubuntu driver packages


Update:

Output of dpkg -l |grep nvidia-prime:
ii nvidia-prime 0.8.16~0.20.10.1 all Tools to enable NVIDIA's Prime

So the version should be high enough for that fix it seems. I did an update of initrd as well just in case, but the problem persists unfortunately.

The udev rules to remove the nvidia gpu get recreated by Ubuntuā€™s gpu-manager:

[    6.747140] pci 0000:01:00.2: Removing from iommu group 8
[    6.747919] pci 0000:01:00.0: Removing from iommu group 8
[    6.748202] pci 0000:01:00.3: Removing from iommu group 8
[    6.748629] pci 0000:01:00.1: Removing from iommu group 8

This was a bug in gpu-manager that should have been fixed, likely you just need to update your system to get the fixed version. Please post the output of
dpkg -l |grep ubuntu-drivers-common

Iā€™ve checked package details and the pm rules belong to the package nvidia-prime. It was fixed in version 8.15.3, the current for 20.10 should be 8.16. Please check
dpkg -l |grep nvidia-prime
and update if itā€™s a lower version. Also, updating the initrd for the running kernel might be necessary incase youā€™re not running the latest:
sudo update-initramfs -u -k $(uname -r)