<340.24/337.*/334.* kernel IOMMU BUG when closing VDPAU applications [FIXED in 340.24]

EDIT: This BUG is caused by VDPAU in conjunction with enabled intel IOMMU.
Can be triggered by either closing mplayer when it plays a video using vdpau, or if you have
libva and libva-vdpau-driver installed, simply by calling vainfo. Easiest workarouds so far
are to either disable CONFIG_IOMMU in the kernel’s .config or adding “intel_iommu=off” to the
kernel commandline.

EDIT2: fixed in 340.24!

Closing firefox usually yields the following kernel BUG() and an almost completely unresponsive system (nvidia-bugreport doesn’t finish, even in --safe-mode).

[ 730.685214] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[ 730.685218] IP: [] iommu_no_mapping+0x7/0x100
[ 730.685224] PGD cd8bc067 PUD cd816067 PMD 0
[ 730.685226] Oops: 0000 [#1] PREEMPT SMP
[ 730.685228] Modules linked in: acpi_call(O) vboxnetadp(O) vboxnetflt(O) vboxpci(O) vboxdrv(O) ipheth cdc_mbim nvidia(PO) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_c
[ 730.685238] CPU: 1 PID: 3093 Comm: plugin-containe Tainted: P O 3.12.10 #1
[ 730.685239] Hardware name: Dell Inc. Precision M6600/04YY4M, BIOS A15 09/27/2013
[ 730.685240] task: ffff8802070dc2e0 ti: ffff88020a9c6000 task.ti: ffff88020a9c6000
[ 730.685241] RIP: 0010:[] [] iommu_no_mapping+0x7/0x100
[ 730.685243] RSP: 0018:ffff88020a9c7d88 EFLAGS: 00010246
[ 730.685244] RAX: 000000020ab6d000 RBX: 0000000000000000 RCX: 0000000000000000
[ 730.685245] RDX: 0000000000000001 RSI: ffff8800ab566e98 RDI: 0000000000000000
[ 730.685246] RBP: ffffea0000000000 R08: 0000000000000000 R09: 0000000000000001
[ 730.685246] R10: ffffffff817cf110 R11: 0000000000000293 R12: ffff8800ab566e98
[ 730.685247] R13: 0000000000000000 R14: ffff8800ab566240 R15: ffff8800ab566e80
[ 730.685248] FS: 00007fec2c14b940(0000) GS:ffff88022dc40000(0000) knlGS:0000000000000000
[ 730.685249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 730.685250] CR2: 0000000000000088 CR3: 00000000a9109000 CR4: 00000000000407e0
[ 730.685250] Stack:
[ 730.685251] 0000000000000000 ffffea0000000000 ffff8800ab566e98 ffffffff817cf125
[ 730.685253] 0000000000000000 ffffea0000000000 000077ff80000000 0000000000000000
[ 730.685254] ffff8800ab566240 ffff8800ab566e80 ffffffffa0615501 0000000000000098
[ 730.685256] Call Trace:
[ 730.685258] [] ? intel_unmap_sg+0x15/0x120
[ 730.685306] [] ? nv_free_system_pages+0xc1/0x3a0 [nvidia]
[ 730.685340] [] ? nv_free_pages+0xcd/0xe0 [nvidia]
[ 730.685374] [] ? nvidia_close+0x32e/0x440 [nvidia]
[ 730.685408] [] ? nvidia_frontend_close+0x3f/0x90 [nvidia]
[ 730.685411] [] ? __fput+0x90/0x200
[ 730.685413] [] ? task_work_run+0x8f/0xd0
[ 730.685417] [] ? int_signal+0x12/0x17
[ 730.685417] Code: b1 97 ff 5b 89 e8 5d 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 b8 f4 ff ff ff e9 42 ff ff ff 0f 0b 0f 1f 40 00 41 54 55 53 48 89 fb <48> 81 bf 88 00 00 00 40 95
[ 730.685435] RIP [] iommu_no_mapping+0x7/0x100
[ 730.685437] RSP
[ 730.685437] CR2: 0000000000000088
[ 730.685439] —[ end trace 3863f5fe082dfafb ]—

nvidia-bug-report.log.gz (94.6 KB)

FWIW, I noticed something related with 331.38, also on 3.12.10:

NVRM: Xid (0000:01:00): 8, Channel 00000003
dmar: DRHD: handling fault status reg 3
dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr fdf65000
DMAR:[fault reason 05] PTE Write access is not set
dmar: DRHD: handling fault status reg 3
[ tons more of the same ]

Have you tried unticking the “use hardware acceleration when available” option on the advanced tab in options? Firefox has caused me serious problems on Windows due to its attempts to utilise the hardware, god knows why it’s so appalling.

Firefox doesn’t use layer acceleration by default on Linux, so I doubt that would affect anything.

I’ve now gotten this BUG() when I change display settings, i.e. use KDE’s kscreen to add and remove a HDMI connected screen, and once when a full-screen mplayer shut down.

I have a same problem, and find a workaround; Adding “intel_iommu=off” as kernel parameter seems to work well.

True, but I’d rather have the IOMMU kill the kernel before the driver/card scribbles over memory
it isn’t supposed to :)

I have a Dell laptop with a different card (Quadro 2000M) and haven’t seen this, although I turned off Optimus and I’m not running KDE. Is Prime/Optimus set up on the system?

My PCs have no Optimus feature because they are all desktop.

I have 5 desktop PCs with NVIDIA’s GPU, and no problem if the CPU has no Intel’s VT-d (aka IOMMU) feature.
So, I think this is a regression of 331.38 related to VT-d or DMA handling.

I wish NVIDIA will fix this in next release.

hi guys, We are not able to repro this issue in house ? We’ve tried with various Fermi cards on a few different systems now with and without IOMMU enabled, and I’m still not reproducing.

Tracking under issue under bug 1456094 …

I had the same problem on my Clevo book using Chromium 33.
Already downgraded to 331.49.

My specs:
Clevo P370em
i7-3940XM
2x GTX 680m

DasLeo, Plz provide nvidia bug report by running nvidia-bug-report.sh as root user and reproduction steps?

Same issue here on Dell E6530 laptop (NVIDIA Corporation GF108GLM [NVS 5200M]) triggered by fullscreen mplayer playback with vdpau output. Unfortunately taking nvidia-bug-report after bug is triggered causes script to hang. I can only provide with report taken before that moment.
nvidia-bug-report.log.gz (109 KB)

Yes, fullscreen mplayer with VDPAU reliably triggers the kernel BUG() when it exits.

334.21 has been released. however, its not fixed yet :-(

it’s still there in 334.21, and can still be reproduced with mplayer as in comment #13

Could anyone confirm the issue was reproduced on nvidia side or still more information is needed?

I confirm the bug. Tried with 334.x driver and 3.13 and 3.14 kernel. 100% reproducible with vainfo tool: it shows VDPAU info and gets killed on exit. Kernel oops appears in log. First run of MPV player is fine usually. I suppose that bug is triggered on some VDPAU closing. 331.x driver is OK. Turning intel_iommu=off fixes (or hides) the bug.

Looks like my issue too:

https://devtalk.nvidia.com/default/topic/715267/linux/crashes-linux-x64-gtx780ti/ – It happened when I closed mplayer.

It’s still reproducable with 337.12. Identical backtrace, same way to reproduce: just close mplayer using vdpau, on plain X, no WMs or desktop environment.