Hello everyone,
Is this this the right area for reporting this issue?
Back on September, I was still using the 450.57 driver with the 5.8 Linux kernel and Ubuntu 20.04.
On 5.10 Linux kernel with the 460.39 version, the workaround I mentioned does not work anymore.
Additionally, the “NVreg_DynamicPowerManagement=0x02” option is now the only way to suspend the NVIDIA GPU when not in use. The “NVreg_DynamicPowerManagement=0x01” will suspend it only if there are no applications running on it. But on 460.39, Xorg creates something like a persistent glxserver for NVIDIA. That counts as an application so the driver will never put it to suspend even if it is not being used.
For the driver development team, I think I have stumbled upon an easier reproduction steps to trigger this bug while the system is still running (making data collection possible)
- Reboot with nvidia, nvidia_drm and nvidia_modeset blacklisted. Make sure that these modules are not loaded but still can be loaded manually.
- Make sure that the
/sys/bus/pci/devices/0000:01:00.0/power/controlisonnotauto - Make sure that
/sys/bus/pci/devices/0000:01:00.0/power/runtime_statusreadsactive - Run
nvidia-smiornvidia-bug-report.shwhich should eventually load thenvidiakernel module. - You get a
Killedmessage fornvidia-smior nothing for the `nvidia-bug-report.sh. Nevertheless, the bug should have been triggered, the GPU will not be usable, and the system is in the brink of crashing.
I attached the output and the dmesgs log for two cases:
-
Just running nvidia-bug-report.sh (I also captured the dmesg log after it)
nvidia-bug-report.tar.gz (694.3 KB) -
Running nvidia-smi then nvidia-bug-report.sh. The nvidia-bug-report.sh hangs.
nvidia-smi.tar.gz (742.9 KB)
If more information is needed, please reply and I will try my best to provide it.