Running on RHEL7.9 Linux kernel version 3.10.0-1160.31.1.el7.x86_64, we are experiencing some issues where the Nvidia Quadro K420 card has issues using 450.80.02 driver version.
We see OS logging like the following
Jul 28 20:54:15 kern.warning: host kernel:[155134.670118] NVRM: GPU at PCI:0000:09:00: GPU-a002baed-c13a-3fe5-bc9d-f37afe819bc7 Jul 28 20:54:15 kern.warning: r0mlsder366 kernel:[155134.670121] NVRM: GPU Board Serial Number: 0425016031088 Jul 28 20:54:15 kern.err: host kernel:[155134.670124] NVRM: Xid (PCI:0000:09:00): 62, pid=17673, 0c83(17dc) 00000000 00000000
A short time later, the OS reports that RT throttling was activated:
Jul 28 20:54:17 kern.warning: host kernel:[155137.415348] sched: RT throttling activated
After that point things start to go awry. It is almost as if the Nvidia cards are spinning on the CPU. Are there any known issues? What data would be needed to help debug the problem? Any known workarounds or fixes?nvidia-bug-report.log.gz (1.7 MB)