GTX 970: Fixing Recursive Fault Error Linux Mint 18

Hello,
I recently got a Zotac GTX 970 Amp! card and I tried to put it in my machine. Once I replaced my GTX 750, I got the following error message in dmesg, and could no longer boot to a GUI.

[   29.236800] Fixing recursive fault but reboot is needed!
[   32.234678] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

The machine is unable to log into X, and displays a blank screen, however it should be noted that splash screen for my distro was fine during boot. I also know the card works, because Windows works perfectly given the same hardware and card. In addition, Nouveau appears to work, albeit with a fixed resolution of 800x600 and only one of my monitors working. However, once I install the nvidia driver and reboot, I get the same error in dmesg.

My full specs:
Intel i7-4790K
Asus Z97-PRO motherboard
32 GB Ram
Zotac GTX 970 Amp!
Fresh install of Linux Mint 18 (Also tried Ubuntu)

I tried the beta driver .run file from the driver page, and I also tried the distro-provided driver.

I have attached is the full contents of nvidia-bug-report.log, run over SSH without X. Note that the command itself hangs. Nvidia-smi also hangs, if that helps any.

Is there anything I can do to fix this?

Thanks.
nvidia-bug-report.log.gz (50.2 KB)

You haven’t properly blacklisted the nouveau driver so you’ve got a resources sharing conflict.

The safe way to install the NVIDIA binary drivers is from the official PPA.

I’m not so sure. I installed from the official PPA, like you said, and I double checked that the nouveau driver was blacklisted in /etc/modprobe.d

/etc/modprobe.d/nvidia-graphics-drivers.conf

# This file was installed by nvidia-384
# Do not edit this file manually

blacklist nouveau
blacklist lbm-nouveau
blacklist nvidia-current
blacklist nvidia-173
blacklist nvidia-96
blacklist nvidia-current-updates
blacklist nvidia-173-updates
blacklist nvidia-96-updates
blacklist nvidia-384-updates
alias nvidia nvidia_384
alias nvidia-uvm nvidia_384_uvm
alias nvidia-modeset nvidia_384_modeset
alias nvidia-drm nvidia_384_drm
alias nouveau off
alias lbm-nouveau off

options nvidia_384_drm modeset=0

Even after rebooting with the PPA based version of 384, I still get the same error in dmesg.

Thanks.

Your dmesg/lspci outputs indicate that the nouveau module is loaded and active. You might want to rebuild your initrd.

Ok. Here’s what I did:
Removed the xserver-xorg-video-nouveau package, which does nothing because the module is still there.
Ran update-initramfs -k all -u. After that, the kernel still reports that it is using the nvidia driver, and that Nouveau is NOT in use.

If the nouveau driver is still somehow causing a resource sharing conflict, I don’t know how to fix it. I’ve blacklisted the driver, and modprobe indicates that it isn’t running.

I’m running some software to test the Vram integrity in Windows, since dmesg referenced a link to the nvidia memory. So far the results are that the VRAM is fine. I know it’s a long shot, but I really need this to work.

Thanks.

I have a GTX970. Regardless of distro, I always have to add the command: nomodeset
to my boot parameters before I get into windows. After the driver installs properly, this isn’t necessary anymore however.

Some distros I have had to load Xwindows from my onboard GPU, to get the drivers installed right. The way I usually get the packages are:

apt install firmware-linux nvidia-driver nvidia-settings nvidia-xconfig

info on nomodeset:
https://askubuntu.com/questions/140640/nvidia-drivers-and-kernel-update-problems-nomodeset

EDIT

I see you are using Mint. Boot into mint with nomodeset, or using your onboard GPU card, and just let the distro tool handle the drivers. It has a tool built into it called driver-manager.

That’s actually the first thing I tried. The kernel error still happens regardless of if nomodeset is on or not.
My current boot line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
GRUB_CMDLINE_LINUX=""

And of course I ran update-grub as root immediately afterwards. What I find interesting is that even with nomodeset set and enabled, I still see nvidia-modeset as the driver in the dmesg output, as shown below.

[   26.718749] nvidia-modeset: Allocated GPU:0 (GPU-18c825e4-cb3a-9197-3e69-e2ec579aa48f) @ PCI:0000:01:00.0
[   27.054577] BUG: unable to handle kernel paging request at ffff881c18586830
[   27.054581] IP: [<ffffffffc0a14630>] _nv014404rm+0x620/0x780 [nvidia]
[   27.054718] PGD 3202067 PUD 0 
[   27.054719] Oops: 0000 [#1] SMP 
[   27.054721] Modules linked in: bnep binfmt_misc nls_iso8859_1 nvidia_uvm(POE) uvcvideo nvidia_drm(POE) videobuf2_vmalloc videobuf2_memops nvidia_modeset(POE) videobuf2_v4l2 videobuf2_core v4l2_common videodev arc4 media nvidia(POE) ath9k snd_usb_audio ath9k_common eeepc_wmi ath9k_hw mxm_wmi asus_wmi snd_usbmidi_lib joydev input_leds drm_kms_helper ath3k drm btusb sparse_keymap drbg ansi_cprng intel_rapl dm_crypt snd_hda_codec_hdmi ath fb_sys_fops btrtl x86_pkg_temp_thermal btbcm intel_powerclamp btintel coretemp syscopyarea mac80211 kvm_intel snd_hda_codec_realtek snd_hda_codec_generic sysfillrect snd_hda_intel snd_hda_codec bluetooth snd_hda_core kvm cfg80211 snd_seq_midi snd_seq_midi_event snd_rawmidi sysimgblt snd_seq snd_hwdep snd_pcm irqbypass crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer
[   27.054739]  snd ghash_clmulni_intel soundcore shpchp aesni_intel aes_x86_64 tpm_infineon lrw gf128mul mei_me glue_helper ablk_helper serio_raw lpc_ich mei wmi cryptd acpi_pad mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage psmouse e1000e ahci libahci ptp pps_core fjes video
[   27.054751] CPU: 2 PID: 1467 Comm: irq/48-nvidia Tainted: P           OE   4.4.0-53-generic #74-Ubuntu
[   27.054752] Hardware name: ASUS All Series/Z97-PRO, BIOS 2702 10/27/2015
[   27.054753] task: ffff880813ed9b80 ti: ffff880816cd4000 task.ti: ffff880816cd4000
[   27.054754] RIP: 0010:[<ffffffffc0a14630>]  [<ffffffffc0a14630>] _nv014404rm+0x620/0x780 [nvidia]
[   27.054838] RSP: 0018:ffff880816cd7ca0  EFLAGS: 00010246
[   27.054839] RAX: 00000004fffffffb RBX: ffff880819048008 RCX: ffff880816022d50
[   27.054840] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff880819048008
[   27.054840] RBP: ffff880816022cf0 R08: 0000000000000000 R09: ffff880816022d48
[   27.054841] R10: 0000000000000004 R11: 0000000000000000 R12: 00000000ffffffff
[   27.054842] R13: ffff8800bc5aa010 R14: ffff881c1858682c R15: 0000000000000000
[   27.054843] FS:  0000000000000000(0000) GS:ffff88083ec80000(0000) knlGS:0000000000000000
[   27.054843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.054844] CR2: ffff881c18586830 CR3: 0000000002e0a000 CR4: 00000000001406e0
[   27.054845] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   27.054845] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   27.054846] Stack:
[   27.054846]  0000000000000000 ffff880819048008 0000000000000009 ffff8808194e0008
[   27.054848]  0000000000002a01 ffffffffc0a6050e ffff880819048008 ffff880816022dd8
[   27.054849]  ffff8808194e0008 0000000000000000 00000000ffffffff ffffffffc0a62025
[   27.054850] Call Trace:
[   27.054949]  [<ffffffffc0a6050e>] ? _nv013618rm+0x1ee/0x560 [nvidia]
[   27.055040]  [<ffffffffc0a62025>] ? _nv013625rm+0x385/0x3f0 [nvidia]
[   27.055134]  [<ffffffffc0a97e30>] ? _nv014240rm+0x230/0x2c0 [nvidia]
[   27.055273]  [<ffffffffc0c81d03>] ? _nv006730rm+0x1a3/0x280 [nvidia]
[   27.055413]  [<ffffffffc0c79ca1>] ? _nv025218rm+0x71/0xa0 [nvidia]
[   27.055511]  [<ffffffffc0ed398e>] ? _nv001199rm+0x10e/0x150 [nvidia]
[   27.055513]  [<ffffffff810dbfb0>] ? irq_finalize_oneshot.part.35+0xe0/0xe0
[   27.055612]  [<ffffffffc0ed9333>] ? rm_isr_bh+0x23/0x70 [nvidia]
[   27.055671]  [<ffffffffc0886f5d>] ? nvidia_isr_common_bh+0x3d/0x60 [nvidia]
[   27.055730]  [<ffffffffc0886fa1>] ? nvidia_isr_kthread_bh+0x11/0x20 [nvidia]
[   27.055731]  [<ffffffff810dbfd0>] ? irq_thread_fn+0x20/0x50
[   27.055732]  [<ffffffff810dc318>] ? irq_thread+0x138/0x1c0
[   27.055733]  [<ffffffff810dc070>] ? irq_forced_thread_fn+0x70/0x70
[   27.055734]  [<ffffffff810dc1e0>] ? irq_thread_check_affinity+0xc0/0xc0
[   27.055736]  [<ffffffff810a09d8>] ? kthread+0xd8/0xf0
[   27.055737]  [<ffffffff810a0900>] ? kthread_create_on_node+0x1e0/0x1e0
[   27.055740]  [<ffffffff8183640f>] ? ret_from_fork+0x3f/0x70
[   27.055741]  [<ffffffff810a0900>] ? kthread_create_on_node+0x1e0/0x1e0
[   27.055741] Code: 5c 00 0f 85 09 fc ff ff 48 8b 55 48 44 89 e0 45 31 c0 48 8d 04 80 48 8d 4d 60 8b 75 2c 48 89 df 4c 8d b4 82 40 08 00 00 44 89 e2 <45> 8b 6e 04 ff 93 b0 0d 00 00 48 8b 75 18 48 8b 55 40 48 83 ec 
[   27.055753] RIP  [<ffffffffc0a14630>] _nv014404rm+0x620/0x780 [nvidia]
[   27.055839]  RSP <ffff880816cd7ca0>
[   27.055839] CR2: ffff881c18586830
[   27.055840] ---[ end trace 885dd2bf1f4d3ee1 ]---
[   27.055846] BUG: unable to handle kernel paging request at ffffffffffffffd8
[   27.055847] IP: [<ffffffff810a1080>] kthread_data+0x10/0x20
[   27.055848] PGD 2e0d067 PUD 2e0f067 PMD 0 
[   27.055849] Oops: 0000 [#2] SMP 
[   27.055850] Modules linked in: bnep binfmt_misc nls_iso8859_1 nvidia_uvm(POE) uvcvideo nvidia_drm(POE) videobuf2_vmalloc videobuf2_memops nvidia_modeset(POE) videobuf2_v4l2 videobuf2_core v4l2_common videodev arc4 media nvidia(POE) ath9k snd_usb_audio ath9k_common eeepc_wmi ath9k_hw mxm_wmi asus_wmi snd_usbmidi_lib joydev input_leds drm_kms_helper ath3k drm btusb sparse_keymap drbg ansi_cprng intel_rapl dm_crypt snd_hda_codec_hdmi ath fb_sys_fops btrtl x86_pkg_temp_thermal btbcm intel_powerclamp btintel coretemp syscopyarea mac80211 kvm_intel snd_hda_codec_realtek snd_hda_codec_generic sysfillrect snd_hda_intel snd_hda_codec bluetooth snd_hda_core kvm cfg80211 snd_seq_midi snd_seq_midi_event snd_rawmidi sysimgblt snd_seq snd_hwdep snd_pcm irqbypass crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer
[   27.055864]  snd ghash_clmulni_intel soundcore shpchp aesni_intel aes_x86_64 tpm_infineon lrw gf128mul mei_me glue_helper ablk_helper serio_raw lpc_ich mei wmi cryptd acpi_pad mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage psmouse e1000e ahci libahci ptp pps_core fjes video
[   27.055873] CPU: 2 PID: 1467 Comm: irq/48-nvidia Tainted: P      D    OE   4.4.0-53-generic #74-Ubuntu
[   27.055874] Hardware name: ASUS All Series/Z97-PRO, BIOS 2702 10/27/2015
[   27.055874] task: ffff880813ed9b80 ti: ffff880816cd4000 task.ti: ffff880816cd4000
[   27.055875] RIP: 0010:[<ffffffff810a1080>]  [<ffffffff810a1080>] kthread_data+0x10/0x20
[   27.055876] RSP: 0018:ffff880816cd79b8  EFLAGS: 00010202
[   27.055877] RAX: 0000000000000000 RBX: ffff880813ed9b80 RCX: 0000000000000000
[   27.055878] RDX: ffff880816cd7e80 RSI: 0000000000000000 RDI: ffff880813ed9b80
[   27.055878] RBP: ffff880816cd79b8 R08: 0000000000000000 R09: 0000000000000491
[   27.055879] R10: 0000000000000004 R11: 0000000000000491 R12: ffffffff82103a30
[   27.055879] R13: ffff880816cd7e80 R14: 0000000000000000 R15: ffff880813eda1f8
[   27.055880] FS:  0000000000000000(0000) GS:ffff88083ec80000(0000) knlGS:0000000000000000
[   27.055881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.055881] CR2: ffffffffffffffd8 CR3: 0000000002e0a000 CR4: 00000000001406e0
[   27.055882] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   27.055882] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   27.055883] Stack:
[   27.055883]  ffff880816cd79d8 ffffffff810dc093 ffff880813ed9b80 ffffffff82103a30
[   27.055884]  ffff880816cd7a18 ffffffff8109edf1 ffff880813eda22c ffff880813ed9b80
[   27.055885]  ffff880816cd7a50 0000000000000000 0000000000000046 ffff881c18586830
[   27.055887] Call Trace:
[   27.055888]  [<ffffffff810dc093>] irq_thread_dtor+0x23/0xb0
[   27.055891]  [<ffffffff8109edf1>] task_work_run+0x81/0xa0
[   27.055893]  [<ffffffff81083ef1>] do_exit+0x2e1/0xb00
[   27.055897]  [<ffffffff81031ba1>] oops_end+0xa1/0xd0
[   27.055899]  [<ffffffff8106acf5>] no_context+0x135/0x380
[   27.055979]  [<ffffffffc0943745>] ? _nv011773rm+0x25/0xa0 [nvidia]
[   27.055980]  [<ffffffff8106afc0>] __bad_area_nosemaphore+0x80/0x1f0
[   27.056051]  [<ffffffffc09436bd>] ? _nv011374rm+0x2d/0x90 [nvidia]
[   27.056052]  [<ffffffff8106b143>] bad_area_nosemaphore+0x13/0x20
[   27.056053]  [<ffffffff8106b407>] __do_page_fault+0xb7/0x400
[   27.056124]  [<ffffffffc09436bd>] ? _nv011374rm+0x2d/0x90 [nvidia]
[   27.056199]  [<ffffffffc093a6b7>] ? _nv011371rm+0x47/0x80 [nvidia]
[   27.056200]  [<ffffffff8106b772>] do_page_fault+0x22/0x30
[   27.056202]  [<ffffffff818381f8>] page_fault+0x28/0x30
[   27.056288]  [<ffffffffc0a14630>] ? _nv014404rm+0x620/0x780 [nvidia]
[   27.056378]  [<ffffffffc0a6050e>] ? _nv013618rm+0x1ee/0x560 [nvidia]
[   27.056468]  [<ffffffffc0a62025>] ? _nv013625rm+0x385/0x3f0 [nvidia]
[   27.056563]  [<ffffffffc0a97e30>] ? _nv014240rm+0x230/0x2c0 [nvidia]
[   27.056698]  [<ffffffffc0c81d03>] ? _nv006730rm+0x1a3/0x280 [nvidia]
[   27.056834]  [<ffffffffc0c79ca1>] ? _nv025218rm+0x71/0xa0 [nvidia]
[   27.056929]  [<ffffffffc0ed398e>] ? _nv001199rm+0x10e/0x150 [nvidia]
[   27.056931]  [<ffffffff810dbfb0>] ? irq_finalize_oneshot.part.35+0xe0/0xe0
[   27.057026]  [<ffffffffc0ed9333>] ? rm_isr_bh+0x23/0x70 [nvidia]
[   27.057089]  [<ffffffffc0886f5d>] ? nvidia_isr_common_bh+0x3d/0x60 [nvidia]
[   27.057152]  [<ffffffffc0886fa1>] ? nvidia_isr_kthread_bh+0x11/0x20 [nvidia]
[   27.057153]  [<ffffffff810dbfd0>] ? irq_thread_fn+0x20/0x50
[   27.057154]  [<ffffffff810dc318>] ? irq_thread+0x138/0x1c0
[   27.057155]  [<ffffffff810dc070>] ? irq_forced_thread_fn+0x70/0x70
[   27.057156]  [<ffffffff810dc1e0>] ? irq_thread_check_affinity+0xc0/0xc0
[   27.057157]  [<ffffffff810a09d8>] ? kthread+0xd8/0xf0
[   27.057158]  [<ffffffff810a0900>] ? kthread_create_on_node+0x1e0/0x1e0
[   27.057160]  [<ffffffff8183640f>] ? ret_from_fork+0x3f/0x70
[   27.057161]  [<ffffffff810a0900>] ? kthread_create_on_node+0x1e0/0x1e0
[   27.057161] Code: ff ff ff be 46 02 00 00 48 c7 c7 c0 6b cb 81 e8 77 03 fe ff e9 a6 fe ff ff 66 90 0f 1f 44 00 00 48 8b 87 f8 04 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
[   27.057173] RIP  [<ffffffff810a1080>] kthread_data+0x10/0x20
[   27.057174]  RSP <ffff880816cd79b8>
[   27.057175] CR2: ffffffffffffffd8
[   27.057175] ---[ end trace 885dd2bf1f4d3ee2 ]---
[   27.057176] Fixing recursive fault but reboot is needed!
[   27.379065] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   27.379093] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   30.054650] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

Thanks.

Does this also happen if you connect just one monitor?
Any change if you generate an xorg.conf using nvidia-xconfig?

That worked. I removed all but one monitor (DVI) and it worked. Now the question is: Why does the driver crash when I add my other monitors?

I have 2x displayport monitors, and one HDMI monitor, if that helps.

It appears that it works, but if I use the DVI port in conjunction with another monitor, the driver crashes and I am unable to boot into a gui. I was able to swap in a displayport to dvi adapter and it started working.

This means that there’s something wrong with the DVI port, in either the kernel driver or my hardware/setup. I don’t know which it is, but I can still say that it does work in windows.

Thanks for all your help.

I noticed your system was chrashing while setting the modes on all 4 displays.
One more oddity: your DVI connected display is driven as analog VGA display, at least when all displays are connected. Can you please run nvidia-bug-report.sh again with only that monitor connected? Does it have a menu where you can switch inputs?
Then it would be interesting to have all other displays connected without the DVI one.
Edit: same time post, seems you found a workaround.

Hi jspike397, Are you using any DP-DVI or DVI-DP adapter to connect displays? Is the issue reproduce without these adapters and using directly display cables? What are the minimum number of displays with which you are hit this issue? What desktop environment you are running KDE, XFCE , GNOME or else? Please share make/model of display you are testing with? Is the issue hit as soon as you start X/Graphical desktop? OR Do you need to hot-plug in and out some displays to trigger the issue? Can you please provide the detailed reproduction steps and some information about your displays, hardware setup?

You can blacklist Nouveau Driver in /etc/modprobe.d/blacklist.conf file. OR create file like /etc/modprobe.d/disable-nouveau.conf with below entries
blacklist nouveau
options nouveau modeset=0

  • And replace kernel parameters : vga=0 rdblacklist=nouveau nouveau.modeset=0
  • Reboot