Geforce 680 on Fedora: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0

ada · May 16, 2019, 9:12am

Ok, CUDA installs the 418 driver at /usr/lib64/xorg/modules/drivers/
I did generate a bug report with verbose output from X. But it appears I can’t attach it any more…

btw.: Installed the 390 from the repo now (no CUDA!), and back to the old problem:
dmesg:
[ 326.182519] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ 552.024437] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0
[ 588.025475] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0

x.0.log
[ 322.640] (==) ModulePath set to “/usr/lib64/xorg/modules”
[ 322.640] (II) The server relies on udev to provide the list of input devices.
If no devices become available, reconfigure udev or disable AutoAddDevices.
[ 322.640] (II) Loader magic: 0x558210699e00
[ 322.640] (II) Module ABI versions:
[ 322.640] X.Org ANSI C Emulation: 0.4
[ 322.640] X.Org Video Driver: 24.0
[ 322.640] X.Org XInput driver : 24.1
[ 322.640] X.Org Server Extension : 10.0
[ 322.641] (–) using VT number 2

[ 322.641] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integrati
on
[ 322.642] (II) xfree86: Adding drm device (/dev/dri/card0)
[ 322.729] () OutputClass “nvidia” ModulePath extended to “/usr/lib64/nvidia-390xx/xorg,/usr/lib64/xorg/modules”
[ 322.729] () OutputClass “nvidia” setting /dev/dri/card0 as PrimaryGPU
[ 322.734] (–) PCI:*(5@0:0:0) 10de:1180:3842:0969 rev 161, Mem @ 0x8a000000/16777216, 0x80000000/134217728, 0x88000000/335544
32, I/O @ 0x00003000/128, BIOS @ 0x???/524288
[ 322.734] (II) LoadModule: “glx”
[ 322.734] (II) Loading /usr/lib64/nvidia-390xx/xorg/libglx.so
[ 323.018] (II) Module glx: vendor=“NVIDIA Corporation”
[ 323.018] compiled for 4.0.2, module version = 1.0.0
[ 323.018] Module class: X.Org Server Extension
[ 323.024] (II) NVIDIA GLX Module 390.116 Sun Jan 27 06:24:32 PST 2019
[ 323.040] (II) Applying OutputClass “nvidia” to /dev/dri/card0
[ 323.040] loading driver: nvidia
[ 323.407] (==) Matched nvidia as autoconfigured driver 0
[ 323.407] (==) Matched nouveau as autoconfigured driver 1
[ 323.407] (==) Matched nv as autoconfigured driver 2
[ 323.407] (==) Matched modesetting as autoconfigured driver 3
[ 323.407] (==) Matched fbdev as autoconfigured driver 4
[ 323.407] (==) Matched vesa as autoconfigured driver 5
[ 323.407] (==) Assigned the driver to the xf86ConfigLayout
[ 323.407] (II) LoadModule: “nvidia”
[ 323.407] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 323.453] (II) Module nvidia: vendor=“NVIDIA Corporation”
[ 323.453] compiled for 4.0.2, module version = 1.0.0
[ 323.453] Module class: X.Org Video Driver

–
[ 323.602] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 680 (GK104) at PCI:5:0:0 (GPU-0)
[ 323.602] (–) NVIDIA(0): Memory: 2097152 kBytes
[ 323.602] (–) NVIDIA(0): VideoBIOS: 80.04.87.00.09
[ 323.602] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[ 323.605] (–) NVIDIA(GPU-0): CRT-0: disconnected
[ 323.605] (–) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[ 323.605] (–) NVIDIA(GPU-0):
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: disconnected
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: Internal TMDS
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
[ 323.608] (–) NVIDIA(GPU-0):
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): connected
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): Internal TMDS
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): 165.0 MHz maximum pixel clock
[ 323.614] (–) NVIDIA(GPU-0):
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): connected
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): Internal TMDS
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): 340.0 MHz maximum pixel clock
nvidia-bug-report.log.gz (1.14 MB)

ada · May 16, 2019, 9:39am

btw.: Found the link again on the known issue with the 340 driver
https://devtalk.nvidia.com/default/topic/1031067/linux/-linux416-nvidia-390-48-nvidia_stack_cache-rip-0010-usercopy_warn-0x7e-0xa0/
Will attach bug report from the 390 repo driver that caused the usual problem.

So, again my conclusion:
GeForce GTX 680 EFI will not work with any driver >340. Nvidia silently stopped supporting it, refusing any public comment.
Driver 340 will not work on later kernels because of the following issue:
“The nvidia driver uses an alternate stack, so it’s expected for it to do usercopies to and from that. I think we just need to allocate the alternate stacks with kmem_cache_create_usercopy() rather than kmem_cache_create().”

Spent now months on waiting for drivers, trying to work around things etc. Never got a single useful answer from Nvidia directly. Time to move on and give up on Nvidia, I guess.

@generix: Many thanks for your patience and help! Any last ideas? Nvidia is not listing anyway.
nvidia-bug-report.log.gz (64.8 KB)
nvidia-bug-report.log-340.gz (229 KB)

generix · May 16, 2019, 9:57am

The .run installers for 340 won’t work, those have not been updated for ages, always rely on repo drivers as those are patched.
Only other method would be reflashing the card with a standard vbios to see if that makes it work with current drivers. Of course, backup the current vbios beforehands. You’ll lose the boot-up screen and device/OS-selection, like known.

ada · May 16, 2019, 12:14pm

Yes, the .run installers are not very impressive. Need to see if I find a supported 340 driver in the repo. The repo seems to have some serious flaws as well, considering that the 390 CUDA installs the 418 driver.

I’m not so fancy flashing my genuine Mac Edition with another rom. This Mac Pro is triple boot Mojave, Windows and Linux as a test machine. Loosing boot screen would be very annoying.

I have another 670 I made EFI myself, will try that one first. If it fails, I have a modded 780 as well in another Mac Pro.
But I guess this machine will be going AMD, and my NVIDIA stuff will see the (e)bay piece by piece. NVIDIA caused so much support trouble to us Mac users lately, trust has gone down… Especially considering how Nvidia has gone completely silent on our issues lately.

ada · May 16, 2019, 12:34pm

Just for completeness, the repo 340 driver has the same issue, not compatible with later kernels:
[ 40.999153] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object ‘nvidia_stack_t’ (offset 11864, size 3)!
[ 40.999163] WARNING: CPU: 1 PID: 2485 at mm/usercopy.c:78 usercopy_warn+0x7d/0xa0
[ 40.999163] Modules linked in: rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc nls_utf8 hfsplus intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio crct10dif_pclmul btusb snd_hda_intel btrtl btbcm btintel snd_hda_codec bluetooth snd_hda_core crc32_pclmul snd_hwdep ecdh_generic snd_seq rfkill iTCO_wdt ghash_clmulni_intel iTCO_vendor_support nvidia(POE) snd_seq_device snd_pcm intel_cstate intel_uncore snd_timer snd applesmc input_polldev soundcore drm ioatdma lpc_ich i7core_edac dca pcc_cpufreq i5500_temp i2c_i801 acpi_cpufreq xfs libcrc32c firewire_ohci crc32c_intel e1000e firewire_core crc_itu_t
[ 40.999186] CPU: 1 PID: 2485 Comm: Xorg Tainted: P IOE 5.0.16-300.fc30.x86_64 #1
[ 40.999186] Hardware name: Apple Inc. MacPro5,1/Mac-F221BEC8, BIOS 144.0.0.0.0 04/12/2019
[ 40.999188] RIP: 0010:usercopy_warn+0x7d/0xa0
[ 40.999188] Code: 0d 83 41 51 49 89 c0 49 c7 c2 81 3b 0c 83 49 89 f1 48 89 f9 4c 0f 45 d2 48 c7 c7 90 78 0d 83 4c 89 da 4c 89 d6 e8 12 10 e0 ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 ad 08 0f 83 49 89 f1 48 89 f0 eb 96
[ 40.999189] RSP: 0018:ffffb59a08fe7bb0 EFLAGS: 00010282
[ 40.999190] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000006
[ 40.999190] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff92c07f6568c0
[ 40.999190] RBP: ffff92c051a32e5b R08: 0000000000000001 R09: 000000000000045a
[ 40.999191] R10: 0000000000018578 R11: 0000000000000003 R12: ffff92c051a32e58
[ 40.999191] R13: 0000000000000001 R14: ffff92c051a32e58 R15: ffff92c051a32ea0
[ 40.999192] FS: 00007f253c920a80(0000) GS:ffff92c07f640000(0000) knlGS:0000000000000000
[ 40.999193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 40.999193] CR2: 00007f2537f0fa00 CR3: 00000010118b6005 CR4: 00000000000206e0
[ 40.999194] Call Trace:
[ 40.999198] __check_object_size+0x131/0x15d
[ 40.999281] os_memcpy_to_user+0x23/0x50 [nvidia]
[ 40.999347] _nv001372rm+0xa5/0x260 [nvidia]
[ 40.999415] ? _nv004784rm+0x4eba/0x5500 [nvidia]
[ 40.999483] ? _nv004331rm+0xec/0xf0 [nvidia]
[ 40.999550] ? _nv004326rm+0xca/0x650 [nvidia]
[ 40.999613] ? _nv015126rm+0x576/0x5c0 [nvidia]
[ 40.999677] ? _nv000694rm+0x2e/0x60 [nvidia]
[ 40.999735] ? _nv000789rm+0x5f5/0x8b0 [nvidia]
[ 40.999792] ? rm_ioctl+0x73/0x100 [nvidia]
[ 40.999795] ? iomap_zero_range_actor+0x180/0x1c0
[ 40.999851] ? nvidia_ioctl+0x155/0x480 [nvidia]
[ 40.999908] ? nvidia_frontend_ioctl+0x32/0x50 [nvidia]
[ 40.999964] ? nvidia_frontend_unlocked_ioctl+0x19/0x20 [nvidia]
[ 40.999966] ? do_vfs_ioctl+0x405/0x660
[ 40.999967] ? ksys_ioctl+0x5e/0x90
[ 40.999968] ? ksys_write+0x57/0xd0
[ 40.999969] ? __x64_sys_ioctl+0x16/0x20
[ 40.999971] ? do_syscall_64+0x5b/0x170
[ 40.999973] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 40.999974] —[ end trace a94ac60fc4d5177d ]—

generix · May 16, 2019, 12:55pm

Yes, but that’s only a warning though I’m surprised that the repo driver doesn’t include (the IIRC rather simple) patch for it.
On OS X drivers:
[url]https://devtalk.nvidia.com/default/topic/1042279/cuda-setup-and-installation/cuda-10-and-macos-10-14/[/url]
Problem is, those macOS-ready Nvidia cards are customized versions commisioned by Apple so Apple is Nvidia’s customer who doesn’t want to pay anymore for support. You’re Apple’s customer but in effect the end-user is just sitting in-between, being busted.
You can also experience the same with notebooks, the manufacturer builds crap, orders a driver workaround for say, two years. Then it breaks. That’s the way companies work, I guess.

generix · May 16, 2019, 1:19pm

Speaking of broken notebooks reminds me of this:
[url]https://devtalk.nvidia.com/default/topic/1020418/linux/lenovo-y550-gt-m240-vs-ubuntu-17-07-16-04-14-04/post/5198178/#5198178[/url]
it was sold with a broken vbios which couldn’t be reflashed as it was embedded in the system bios. The trick was to load a sanitized vbios before the driver was loaded. If you’re really bored, you could try that method to not having to reflash the vbios, instead loading a matching, standard vbios into ram.
Don’t think that will work on EFI, though.

ada · May 16, 2019, 2:21pm

Well, I guess I’ll move on to AMD then. I already moved my 1080TI to a Radeon VII lately. Vega 64 are getting close to 250-300€ on Ebay now. So, it’s best to sell off the rest of my Nvidia gear and move on (and forget CUDA).

btw.: Is NVIDIA support usually responding in this threads as well? I have seen them in some other threads, but this one I guess is just an unpopular one for them. They could at least provide clarity on the fact that this card will never work the way I was hoping…

generix · May 16, 2019, 2:57pm

My guess is, this special case is both unpopular and the nvidia linux devs responding here are in no position to know or say what’s going on.

ada · May 16, 2019, 3:12pm

I can’t believe it! I just cleaned up everything and reverted back to the nouveau driver…
Then I thought I’ll give it one last try, updated the system to the very latest version, and installed the 340xx repo driver again, blacklisted the nouveau driver, rebuilt the initramfs, rebooted.

Guess what: IT WORKS NOW!
I did NOT dare to install CUDA yet though, as I fear this might screw it up again.
I’ll freeze my system, check if CUDA works as well.

Verdict: Install the 340xx repo driver from the beginning, making sure not a bit from the .run stuff from NVIDIA is left over.
It remains to be seen if the CUDA .repo package is overwriting the driver as well or not. We will see.

Will mark this tread as answered for now. Many thanks for your patience and help!!!

generix · May 16, 2019, 3:15pm

But: driver 340 means max cuda 6.5.

ada · May 16, 2019, 3:36pm

Yes :-( this sucks! But at least I have it half-way up and running… Let’s see how far I’m getting.
Still very disappointed about NVIDIA support in general.

generix · June 22, 2019, 3:26pm

See this:
[url]https://devtalk.nvidia.com/default/topic/1042691/linux/black-screen-with-mac-version-of-gtx-680/post/5354150/#5354150[/url]

Topic		Replies	Views
352.63 regression: "NoEdidModes" breaks custom "ModeLine" modes (Ubuntu 14.04) Linux	29	6952	October 14, 2021
X window system fails to initialize in multi-GPU setup. Linux	1	2257	September 1, 2017
(2) GeForce 210, 4 displays, only first GPU recognized using nvidia proprietary drivers Linux	6	1910	March 24, 2015
Xid 61 with 319.32/325.08 on GTX 650. Linux	9	2839	January 7, 2014
Nvidia driver with GTX 650 and arch linux. refusing more than 1024x768 [solved] Linux	10	11384	September 9, 2013
Xorg Error: Seg fault with Nvidia driver update Linux	4	2755	October 12, 2021
Dual Nvidia Cards with 3 Monitors Linux	0	827	February 27, 2020
367.35 - NVS 810 - Can't handle 6 Monitors Linux	0	709	July 22, 2016
nvidia driver 304.108 Segmentation fault with Kernel 3.11 x86_64 & GeForce 6200 Linux	3	2847	October 25, 2013
Linux Ubuntu (with KDE) doesn't recognize laptop screen Linux ubuntu , laptop	10	6828	July 6, 2022

Geforce 680 on Fedora: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0

Related topics