PopOS 22.04 NVIDIA Driver Occasionally Works

Most of the time when I run nvidia-smi I get the error message “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.” However, every once in a while after a reboot it will detect my GPU. I dual boot Windows and Linux. GPU always works flawlessly on Windows. This is an ASUS Zephyrus G15 with an RTX 3070.

I’ve tried reinstalling the NVIDIA driver, using an older version of the driver, using an older kernel, using a newer kernel, rebooting between windows and linux, using system76-driver-nvidia vs nvidia-driver-515, etc. Safe mode is disabled. Windows fast startup is disabled.

Running system76-power graphics nvidia outputs daemon returned an error message: "does not have switchable graphics".

Output of dpkg -l | grep nvidia:

ii  libnvidia-cfg1-515:amd64                         515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-515                             515.65.01-1pop0~1666367711~22.04~b8c0232                        all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-515:amd64                      515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA libcompute package
ii  libnvidia-compute-515:i386                       515.65.01-1pop0~1666367711~22.04~b8c0232                        i386         NVIDIA libcompute package
ii  libnvidia-decode-515:amd64                       515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-515:i386                        515.65.01-1pop0~1666367711~22.04~b8c0232                        i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64                     1:1.1.9-1.1                                                     amd64        Wayland EGL External Platform library -- shared library
ii  libnvidia-encode-515:amd64                       515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-515:i386                        515.65.01-1pop0~1666367711~22.04~b8c0232                        i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-515:amd64                        515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-515:amd64                         515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-515:i386                          515.65.01-1pop0~1666367711~22.04~b8c0232                        i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-515:amd64                           515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-515:i386                            515.65.01-1pop0~1666367711~22.04~b8c0232                        i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  nvidia-compute-utils-515                         515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA compute utilities
ii  nvidia-dkms-515                                  515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA DKMS package
ii  nvidia-driver-515                                515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA driver metapackage
ii  nvidia-kernel-common-515                         515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-515                         515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA kernel source package
ii  nvidia-settings                                  510.47.03-0ubuntu1                                              amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-515                                 515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                          0.18.2                                                          all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-515                    515.65.01-1pop0~1666367711~22.04~b8c0232                        amd64        NVIDIA binary Xorg driver

dmesg is filled with:

[   21.972836] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[   21.972852] NVRM: No NVIDIA GPU found.
[   21.974095] nvidia-nvlink: Unregistered Nvlink Core, major device number 508

This bug report is with 6.0.2-76060002-generic, but I had the same issue with 5.19.16-76051916-generic:
nvidia-bug-report.log.gz (202.1 KB)

Thanks for the help!

Here is a bug report for a time when the GPU worked and was detected properly:
nvidia-bug-report.log.gz (517.6 KB)

When the gpu was not detected, it wasn’t even handed over by the bios, so there’s a low level issue. Usually, I’d suspect the gpu is beginning to break in such a case. Since you said with Windows it’s reliably detected, you might want to check for a bios update first.
In case the gpu is not detected, does a rescan make it appear?
echo 1 | sudo tee /sys/bus/pci/devices/0000:00:01.1/rescan
to check if the gpu appeared:
sudo lspci -nn |grep 10de

Performing a rescan doesn’t solve the issue. Additionally, my bios is up to date. Any other ideas?