Geforce 680 on Fedora: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0

Ok, CUDA installs the 418 driver at /usr/lib64/xorg/modules/drivers/
I did generate a bug report with verbose output from X. But it appears I can’t attach it any more…

btw.: Installed the 390 from the repo now (no CUDA!), and back to the old problem:
dmesg:
[ 326.182519] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ 552.024437] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0
[ 588.025475] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0

x.0.log
[ 322.640] (==) ModulePath set to “/usr/lib64/xorg/modules”
[ 322.640] (II) The server relies on udev to provide the list of input devices.
If no devices become available, reconfigure udev or disable AutoAddDevices.
[ 322.640] (II) Loader magic: 0x558210699e00
[ 322.640] (II) Module ABI versions:
[ 322.640] X.Org ANSI C Emulation: 0.4
[ 322.640] X.Org Video Driver: 24.0
[ 322.640] X.Org XInput driver : 24.1
[ 322.640] X.Org Server Extension : 10.0
[ 322.641] (–) using VT number 2

[ 322.641] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integrati
on
[ 322.642] (II) xfree86: Adding drm device (/dev/dri/card0)
[ 322.729] () OutputClass “nvidia” ModulePath extended to “/usr/lib64/nvidia-390xx/xorg,/usr/lib64/xorg/modules”
[ 322.729] (
) OutputClass “nvidia” setting /dev/dri/card0 as PrimaryGPU
[ 322.734] (–) PCI:*(5@0:0:0) 10de:1180:3842:0969 rev 161, Mem @ 0x8a000000/16777216, 0x80000000/134217728, 0x88000000/335544
32, I/O @ 0x00003000/128, BIOS @ 0x???/524288
[ 322.734] (II) LoadModule: “glx”
[ 322.734] (II) Loading /usr/lib64/nvidia-390xx/xorg/libglx.so
[ 323.018] (II) Module glx: vendor=“NVIDIA Corporation”
[ 323.018] compiled for 4.0.2, module version = 1.0.0
[ 323.018] Module class: X.Org Server Extension
[ 323.024] (II) NVIDIA GLX Module 390.116 Sun Jan 27 06:24:32 PST 2019
[ 323.040] (II) Applying OutputClass “nvidia” to /dev/dri/card0
[ 323.040] loading driver: nvidia
[ 323.407] (==) Matched nvidia as autoconfigured driver 0
[ 323.407] (==) Matched nouveau as autoconfigured driver 1
[ 323.407] (==) Matched nv as autoconfigured driver 2
[ 323.407] (==) Matched modesetting as autoconfigured driver 3
[ 323.407] (==) Matched fbdev as autoconfigured driver 4
[ 323.407] (==) Matched vesa as autoconfigured driver 5
[ 323.407] (==) Assigned the driver to the xf86ConfigLayout
[ 323.407] (II) LoadModule: “nvidia”
[ 323.407] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 323.453] (II) Module nvidia: vendor=“NVIDIA Corporation”
[ 323.453] compiled for 4.0.2, module version = 1.0.0
[ 323.453] Module class: X.Org Video Driver


[ 323.602] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 680 (GK104) at PCI:5:0:0 (GPU-0)
[ 323.602] (–) NVIDIA(0): Memory: 2097152 kBytes
[ 323.602] (–) NVIDIA(0): VideoBIOS: 80.04.87.00.09
[ 323.602] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[ 323.605] (–) NVIDIA(GPU-0): CRT-0: disconnected
[ 323.605] (–) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[ 323.605] (–) NVIDIA(GPU-0):
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: disconnected
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: Internal TMDS
[ 323.608] (–) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
[ 323.608] (–) NVIDIA(GPU-0):
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): connected
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): Internal TMDS
[ 323.614] (–) NVIDIA(GPU-0): DELL 1907FP (DFP-1): 165.0 MHz maximum pixel clock
[ 323.614] (–) NVIDIA(GPU-0):
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): connected
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): Internal TMDS
[ 323.644] (–) NVIDIA(GPU-0): Lenovo Group Limited X1 (DFP-2): 340.0 MHz maximum pixel clock
nvidia-bug-report.log.gz (1.14 MB)

btw.: Found the link again on the known issue with the 340 driver
https://devtalk.nvidia.com/default/topic/1031067/linux/-linux416-nvidia-390-48-nvidia_stack_cache-rip-0010-usercopy_warn-0x7e-0xa0/
Will attach bug report from the 390 repo driver that caused the usual problem.

So, again my conclusion:
GeForce GTX 680 EFI will not work with any driver >340. Nvidia silently stopped supporting it, refusing any public comment.
Driver 340 will not work on later kernels because of the following issue:
“The nvidia driver uses an alternate stack, so it’s expected for it to do usercopies to and from that. I think we just need to allocate the alternate stacks with kmem_cache_create_usercopy() rather than kmem_cache_create().”

Spent now months on waiting for drivers, trying to work around things etc. Never got a single useful answer from Nvidia directly. Time to move on and give up on Nvidia, I guess.

@generix: Many thanks for your patience and help! Any last ideas? Nvidia is not listing anyway.
nvidia-bug-report.log.gz (64.8 KB)
nvidia-bug-report.log-340.gz (229 KB)

The .run installers for 340 won’t work, those have not been updated for ages, always rely on repo drivers as those are patched.
Only other method would be reflashing the card with a standard vbios to see if that makes it work with current drivers. Of course, backup the current vbios beforehands. You’ll lose the boot-up screen and device/OS-selection, like known.

Yes, the .run installers are not very impressive. Need to see if I find a supported 340 driver in the repo. The repo seems to have some serious flaws as well, considering that the 390 CUDA installs the 418 driver.

I’m not so fancy flashing my genuine Mac Edition with another rom. This Mac Pro is triple boot Mojave, Windows and Linux as a test machine. Loosing boot screen would be very annoying.

I have another 670 I made EFI myself, will try that one first. If it fails, I have a modded 780 as well in another Mac Pro.
But I guess this machine will be going AMD, and my NVIDIA stuff will see the (e)bay piece by piece. NVIDIA caused so much support trouble to us Mac users lately, trust has gone down… Especially considering how Nvidia has gone completely silent on our issues lately.

Just for completeness, the repo 340 driver has the same issue, not compatible with later kernels:
[ 40.999153] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object ‘nvidia_stack_t’ (offset 11864, size 3)!
[ 40.999163] WARNING: CPU: 1 PID: 2485 at mm/usercopy.c:78 usercopy_warn+0x7d/0xa0
[ 40.999163] Modules linked in: rfcomm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc nls_utf8 hfsplus intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio crct10dif_pclmul btusb snd_hda_intel btrtl btbcm btintel snd_hda_codec bluetooth snd_hda_core crc32_pclmul snd_hwdep ecdh_generic snd_seq rfkill iTCO_wdt ghash_clmulni_intel iTCO_vendor_support nvidia(POE) snd_seq_device snd_pcm intel_cstate intel_uncore snd_timer snd applesmc input_polldev soundcore drm ioatdma lpc_ich i7core_edac dca pcc_cpufreq i5500_temp i2c_i801 acpi_cpufreq xfs libcrc32c firewire_ohci crc32c_intel e1000e firewire_core crc_itu_t
[ 40.999186] CPU: 1 PID: 2485 Comm: Xorg Tainted: P IOE 5.0.16-300.fc30.x86_64 #1
[ 40.999186] Hardware name: Apple Inc. MacPro5,1/Mac-F221BEC8, BIOS 144.0.0.0.0 04/12/2019
[ 40.999188] RIP: 0010:usercopy_warn+0x7d/0xa0
[ 40.999188] Code: 0d 83 41 51 49 89 c0 49 c7 c2 81 3b 0c 83 49 89 f1 48 89 f9 4c 0f 45 d2 48 c7 c7 90 78 0d 83 4c 89 da 4c 89 d6 e8 12 10 e0 ff <0f> 0b 48 83 c4 18 c3 48 c7 c6 ad 08 0f 83 49 89 f1 48 89 f0 eb 96
[ 40.999189] RSP: 0018:ffffb59a08fe7bb0 EFLAGS: 00010282
[ 40.999190] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000006
[ 40.999190] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff92c07f6568c0
[ 40.999190] RBP: ffff92c051a32e5b R08: 0000000000000001 R09: 000000000000045a
[ 40.999191] R10: 0000000000018578 R11: 0000000000000003 R12: ffff92c051a32e58
[ 40.999191] R13: 0000000000000001 R14: ffff92c051a32e58 R15: ffff92c051a32ea0
[ 40.999192] FS: 00007f253c920a80(0000) GS:ffff92c07f640000(0000) knlGS:0000000000000000
[ 40.999193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 40.999193] CR2: 00007f2537f0fa00 CR3: 00000010118b6005 CR4: 00000000000206e0
[ 40.999194] Call Trace:
[ 40.999198] __check_object_size+0x131/0x15d
[ 40.999281] os_memcpy_to_user+0x23/0x50 [nvidia]
[ 40.999347] _nv001372rm+0xa5/0x260 [nvidia]
[ 40.999415] ? _nv004784rm+0x4eba/0x5500 [nvidia]
[ 40.999483] ? _nv004331rm+0xec/0xf0 [nvidia]
[ 40.999550] ? _nv004326rm+0xca/0x650 [nvidia]
[ 40.999613] ? _nv015126rm+0x576/0x5c0 [nvidia]
[ 40.999677] ? _nv000694rm+0x2e/0x60 [nvidia]
[ 40.999735] ? _nv000789rm+0x5f5/0x8b0 [nvidia]
[ 40.999792] ? rm_ioctl+0x73/0x100 [nvidia]
[ 40.999795] ? iomap_zero_range_actor+0x180/0x1c0
[ 40.999851] ? nvidia_ioctl+0x155/0x480 [nvidia]
[ 40.999908] ? nvidia_frontend_ioctl+0x32/0x50 [nvidia]
[ 40.999964] ? nvidia_frontend_unlocked_ioctl+0x19/0x20 [nvidia]
[ 40.999966] ? do_vfs_ioctl+0x405/0x660
[ 40.999967] ? ksys_ioctl+0x5e/0x90
[ 40.999968] ? ksys_write+0x57/0xd0
[ 40.999969] ? __x64_sys_ioctl+0x16/0x20
[ 40.999971] ? do_syscall_64+0x5b/0x170
[ 40.999973] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 40.999974] —[ end trace a94ac60fc4d5177d ]—

Yes, but that’s only a warning though I’m surprised that the repo driver doesn’t include (the IIRC rather simple) patch for it.
On OS X drivers:
[url]https://devtalk.nvidia.com/default/topic/1042279/cuda-setup-and-installation/cuda-10-and-macos-10-14/[/url]
Problem is, those macOS-ready Nvidia cards are customized versions commisioned by Apple so Apple is Nvidia’s customer who doesn’t want to pay anymore for support. You’re Apple’s customer but in effect the end-user is just sitting in-between, being busted.
You can also experience the same with notebooks, the manufacturer builds crap, orders a driver workaround for say, two years. Then it breaks. That’s the way companies work, I guess.

Speaking of broken notebooks reminds me of this:
[url]https://devtalk.nvidia.com/default/topic/1020418/linux/lenovo-y550-gt-m240-vs-ubuntu-17-07-16-04-14-04/post/5198178/#5198178[/url]
it was sold with a broken vbios which couldn’t be reflashed as it was embedded in the system bios. The trick was to load a sanitized vbios before the driver was loaded. If you’re really bored, you could try that method to not having to reflash the vbios, instead loading a matching, standard vbios into ram.
Don’t think that will work on EFI, though.

Well, I guess I’ll move on to AMD then. I already moved my 1080TI to a Radeon VII lately. Vega 64 are getting close to 250-300€ on Ebay now. So, it’s best to sell off the rest of my Nvidia gear and move on (and forget CUDA).

btw.: Is NVIDIA support usually responding in this threads as well? I have seen them in some other threads, but this one I guess is just an unpopular one for them. They could at least provide clarity on the fact that this card will never work the way I was hoping…

My guess is, this special case is both unpopular and the nvidia linux devs responding here are in no position to know or say what’s going on.

I can’t believe it! I just cleaned up everything and reverted back to the nouveau driver…
Then I thought I’ll give it one last try, updated the system to the very latest version, and installed the 340xx repo driver again, blacklisted the nouveau driver, rebuilt the initramfs, rebooted.

Guess what: IT WORKS NOW!
I did NOT dare to install CUDA yet though, as I fear this might screw it up again.
I’ll freeze my system, check if CUDA works as well.

Verdict: Install the 340xx repo driver from the beginning, making sure not a bit from the .run stuff from NVIDIA is left over.
It remains to be seen if the CUDA .repo package is overwriting the driver as well or not. We will see.

Will mark this tread as answered for now. Many thanks for your patience and help!!!

But: driver 340 means max cuda 6.5.

Yes :-( this sucks! But at least I have it half-way up and running… Let’s see how far I’m getting.
Still very disappointed about NVIDIA support in general.

See this:
[url]https://devtalk.nvidia.com/default/topic/1042691/linux/black-screen-with-mac-version-of-gtx-680/post/5354150/#5354150[/url]