I have a “gaming” pc with MSi RTX 3080 purchased 3 months ago - I have Ubuntu 20 with lambda stack installed (Lambda Stack: an AI software stack that's always up-to-date). Everything was working fine, until yesterday when a process with pytorch running on GPU froze. After reboot, nvidia-smi returns:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Here are some other commands I run to help with the diagnosis:
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Device 9bc5 (rev 05)
$ nvidia-detector
None
$ sudo lshw -C video
*-display
description: VGA compatible controller
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 05
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:160 memory:a0000000-a0ffffff memory:90000000-9fffffff ioport:3000(size=64) memory:c0000-dffff
I did try to reinstall the drivers a couple of times, but it still seems that the OS is not seeing the GPU.
I’m also attaching the nvidia-bug-report.sh output