TLDR: Is it possible that by board is fried even though it works with nouveau
?
I recently purchased an ex-display RTX 2080 Ti from a reputable vendor. Using the nouveau
driver all works as well as expected (4k output, no accelleration), but when I use the nvidia driver the monitor switches off immediately after showing the GRUB menu.
I’ve been able to SSH into the box and collect a bug report: nvidia-bug-report.log.gz (882.6 KB)
Looking through the logs, I see this:
Dec 25 10:43:15 hjk-desktop kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Dec 25 10:49:36 hjk-desktop kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c57e:0 2:0:392:380
Dec 25 10:49:41 hjk-desktop kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c57e:0 2:0:392:380
<last line repeated many times>
and, elsewhere
Dec 24 15:08:20 hjk-MS-7C80 kernel: [ 11.546230] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Dec 24 15:12:11 hjk-MS-7C80 kernel: [ 242.666293] INFO: task nvidia-modeset/:639 blocked for more than 120 seconds.
Dec 24 15:12:11 hjk-MS-7C80 kernel: [ 242.666296] nvidia-modeset/ D 0 639 2 0x80004000
Dec 24 15:14:12 hjk-MS-7C80 kernel: [ 363.482418] INFO: task nvidia-modeset/:639 blocked for more than 241 seconds.
Dec 24 15:14:12 hjk-MS-7C80 kernel: [ 363.482421] nvidia-modeset/ D 0 639 2 0x80004000
and the XServer is using 100% CPU, even though there doesn’t seem to be anything suspicious in the X.org logs.
Over the last few days it has also sometimes got as far as showing a blinking cursor, allowing me to switch to a VT, where I saw NVRM: RmInitAdapter failed!
in the log, but this doesn’t seem to happen consistently and may depend on the driver version.
I’m pretty sure that my driver is installed and configured correctly since if I remove my RTX card an install my old GTX 660 Ti everything works perfectly.
Hardware:
- TU102 [GeForce RTX 2080 Ti Rev. A]
- AOC U2879G6 4K monitor
- MAG Z490 Tomahawk motherboard with latest firmware, CSM boot
- Intel® Core™ i9-10900F CPU @ 2.80GHz
- EVGA Supernova 750W (with two independent 8-pin PCIE cables powering the card)
Software:
- Ubuntu 20.04 (clean install)
- Nvidia driver version 450.80.02 (also tried 390, 418, 430, 435, 440, 455)
- XServer 2:1.20.8-2ubuntu2.6
I’ve tried a whole bunch of things from lurking through these forums which have had no effect:
- Installing the nvidia driver via the latest *.run installer
- connecting to the monitor via either HDMI or Display Port
- connecting to a lower resolution monitor (my old LG W2242S [1680x1050]; had to use USB-C and a converter since it only supports VGA) since I saw people having issues with some 4K devices
- every nvidia driver version available with Ubuntu 20.04 (390/418/430/435/440/450/455)
- various kernel arguments (
nomodeset
,nvidia-drm.modeset=1
,mem_encrypt=off
) - updating motherboard firmware to the lastest
- completely clean OS install
My best guess at this point is that the RTX board is faulty in some non-obvious way. This conclusion appears to be supported by the fact that my old GTX card works perfectly with an otherwise identical hardware and software config, but doesn’t explain how the nouveau
driver is able to get the RTX card to function – though I guess it’s using far less of the board’s circuitry.
Any help or suggestions greatly appreciated - at present I plan on returning it once things begin to re-open in January.