Crashes - Linux x64 GTX780ti

Hi,

I seem to have crashes with the 334.* drivers on linux. Driver 319.76 is rock solid. tested with both 334.16 and 334.21 and I get the crashes below:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff813c6480>] iommu_no_mapping+0x10/0x120
PGD 3f4ba7067 PUD 3df79c067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
Modules linked in: nvidia(PO) binfmt_misc joydev usbhid ipv6 x86_pkg_temp_thermal crc32c_intel microcode ehci_pci ehci_hcd usbcore serio_raw usb_common 8250 serial_core [last unloaded: nvidia]
CPU: 9 PID: 4599 Comm: mplayer Tainted: P           O 3.13.7-custom #1
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X79-UP4, BIOS F6a 12/18/2013
task: ffff880423bef0a0 ti: ffff8803ca05e000 task.ti: ffff8803ca05e000
RIP: 0010:[<ffffffff813c6480>]  [<ffffffff813c6480>] iommu_no_mapping+0x10/0x120
RSP: 0018:ffff8803ca05fda0  EFLAGS: 00010286
RAX: 000000039cc38000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff8803cfcd5f18 RDI: 0000000000000000
RBP: ffffea0000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff813c6590 R11: 0000000000000293 R12: ffff8803cfcd5f18
R13: 0000000000000000 R14: ffff8803cfcd5700 R15: ffff8803cfcd5f00
FS:  00007f855566b7c0(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000088 CR3: 00000003ca138000 CR4: 00000000001407e0
Stack:
 0000000000000000 ffffea0000000000 ffffffff813c65bc 0000000000000000
 ffffea0000000000 000077ff80000000 0000000000000000 ffff8803cfcd5700
 ffff8803cfcd5f00 ffffffffa12778ce 0000000000000098 0000000000000000
Call Trace:
 [<ffffffff813c65bc>] ? intel_unmap_sg+0x2c/0x180
 [<ffffffffa12778ce>] ? nv_free_system_pages+0xbe/0x380 [nvidia]
 [<ffffffffa12718b5>] ? nv_free_pages+0xc5/0xd0 [nvidia]
 [<ffffffffa1271be4>] ? nvidia_close+0x324/0x430 [nvidia]
 [<ffffffffa127a87d>] ? nvidia_frontend_close+0x4d/0xa0 [nvidia]
 [<ffffffff810fff32>] ? __fput+0xa2/0x230
 [<ffffffff81056acf>] ? task_work_run+0x8f/0xd0
 [<ffffffff814f9aa8>] ? int_signal+0x12/0x17
Code: ff 66 2e 0f 1f 84 00 00 00 00 00 bd f4 ff ff ff e9 25 ff ff ff 0f 0b 0f 1f 40 00 48 83 ec 10 48 89 1c 24 48 89 fb 48 89 6c 24 08 <48> 81 bf 88 00 00 00 00 95 6b 81 0f 85 d7 00 00 00 48 8b 87 08 
RIP  [<ffffffff813c6480>] iommu_no_mapping+0x10/0x120
 RSP <ffff8803ca05fda0>
CR2: 0000000000000088
---[ end trace 821949b06f5c5060 ]---

Some info:

kernel: Linux desktop 3.13.7-custom #1 SMP PREEMPT Wed Mar 26 23:21:07 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

CPU: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

I have enabled MSI interrupts via the use of an conf file for modprobe.d:

options nvidia NVreg_EnableMSI=1

This was done as I did not fancy the device sharing interrupts with a USB controller!

Looks like I will have to go back to the 319.76 driver… :(

Including what I was doing at the time may help:

  • Dual monitor setup
  • Had game open on one screen (windowed, game uses opengl and is 64 bit), mplayer on second monitor (full screen video)
  • Quit mplayer - all hell broke loose.

Sounds like a double free() somewhere in the driver - the type of crash seems to point to memory allocation issues with the driver.

As mentioned, no issues with the 319.xx driver at all. I am not convinced this is a linux kernel bug, since I had tested the 334.16 driver with the linux 3.13 kernel not long after it’s release (3.13.1) and the crash message was absolutely identical.

Seems to be related to this thread:

https://devtalk.nvidia.com/default/topic/685307/linux/334-21-kernel-bug-when-closing-firefox-tabs-or-vdpau-mplayer/?offset=19#4175776

Exactly the same kernel “oops” on exiting mplayer