Will the FAULT_PDE ACCESS_TYPE_READ bug in the Nvidia driver ever be fixed?

dumbdog · August 5, 2019, 4:49pm

I’m not even sure that Nvidia is aware about this, but there has been a big community behind Valve’s new Proton software that allows to play Windows games on Linux. Sadly, the Nvidia driver on Linux is still not stable enough for many of those games that would work easily on AMD. The author of DXVK (DirectX11 comp. Layer for Linux) or other community members will never be able to fix the bugs in Nvidia’s driver. The Community is waiting for over a year now, but as far as I know there has never been an actual acknowledgement from Nvidia. Many bugreports have been opened on the Github issue tracker, but there is nothing we can do as long as Nvidia is not willing to help.

So my questions are: Is Nvidia aware about this? If yes, why has there been no progress over the last year? Is there even someone working on fixing the segmentations faults in the driver? Or does Nvidia simply not care about the gaming/Linux community enough?

I hope that some day, Nvidia will be as stable as AMD is on Linux, since i’d like to continue buying your hardware. However, if there is no response or progress after more than a year, I guess it’s my own fault for thinking i can use your hardware on Linux in the first place.

Github issues for reference:

github.com/doitsujin/dxvk

Monster Hunter World randomly freezes

opened 07:14PM - 17 Dec 18 UTC

closed 03:44AM - 02 Jan 20 UTC

buscher

nvidia

Monster Hunter World (with proton) randomly freezes. This usually happens in be…tween after 10min to 4hours, so a long random time period. As the DXVK_HUD (with memory) was enabled, at the time of the freeze, around ~3.9gb (assuming this is vram) of 6gb were used. Most noticeable the dmesg output: ``` NVRM: Xid (PCI:0000:09:00): 31, Ch 0000004b, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_4 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ ``` Xid 31, the addr 0x0_00000000, intr 10000000 and ACCESS_TYPE_READ are always constant. To me, it looks like a simple nullptr access, as it is always the 0x0 addr, but I don't know how to investigate this problem further. I can not let the game run with `VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation` or apitrace for hours, as this makes it very unplayable. `PROTON_USE_WINED3D=1` just results in a black screen. Allow flipping (in nvidia-setting) on/off does not change anything. Please let me know how to make this report more useful, I am out of ideas. ### Software information - Monster Hunter World - vsync: off - 30fps lock (getting weird input lag otherwise sometimes) - Steam / Proton 3.16-beta5 ### System information - GPU: Nvidia Geforce 1060gtx 6gb - Driver: nvidia-drivers-415.23 - Wine version: Proton 3.16-beta5 (???) - DXVK version: Proton 3.16-beta5 (dxvk 0.93) - Kernel: 4.19.10 - Ram: 16gb - CPU: Ryzen 2700X ### Log files (with DXVK_LOG_LEVEL=debug and DXVK_HUD=devinfo,fps,memory) - d3d11.log: [MonsterHunterWorld_d3d11.log](https://github.com/doitsujin/dxvk/files/2687470/MonsterHunterWorld_d3d11.log) - dxgi.log: [MonsterHunterWorld_dxgi.log](https://github.com/doitsujin/dxvk/files/2687471/MonsterHunterWorld_dxgi.log) EDIT: The game overall runs pretty well, just the random freezes are a pretty frustrating problem. EDIT2: The screen freezes but the game background music is still running.

And here’s the error message that the driver writes into dmesg:
NVRM: Xid (PCI:0000:09:00): 31, Ch 0000004b, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_4 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ

Note that this error freezes the whole system. Only way to get to a working desktop is to SSH into the PC and kill the process that uses the Nvidia driver.

tugohugo · August 7, 2019, 2:04am

We’ve been dealing with this issue for over two years now on our ffmpeg transcoding servers running linux. It’s to the point that we’ve scripted a way to monitor the /var/log/dmesg for the kernal fault and do a hard reset on the whole server as soon as it happens. It’s forced us to migrate to multiple nodes in the swarm to ‘somewhat’ tolerate this crash and the data loss associated with it. Pretty ridiculous when we run a small GPU cluster all running a mix of Quadro P4000 and P5000s. If AMD had any sort of decent support with ffmpeg it would’ve made us move over, but currently we’re stuck. I pray for a driver fix daily when I get alerts of a driver fault and the server has been force restarted.

NVRM: Xid (PCI:0000:03:00): 31, Ch 00000018, engmask 00008100, intr 10000000. MMU Fault: ENGINE NVDEC HUBCLIENT_NVDEC faulted @ 0xff_fffff000. Fault is of type FAULT_PDE ACCESS_TYPE_READ

There’s been another report here as well - https://devtalk.nvidia.com/default/topic/1042835/linux/nvidia-docker-based-host-hangs-when-gpu-memory-exceeded-with-ffmpeg-transcodes/post/5289719/

kokoko3k · August 29, 2019, 6:41am

It has been almost two months and no developer answered.
Is really this the kind of support we should expect?
Anyway, i join the question.

nvidia-bug-report.log.gz (1.13 MB)

tugohugo · September 10, 2019, 1:44am

We’re not shoveling millions of dollars to nvidia so they could care less. Pretty unfortunate as we’ve spent over $50k in GPUs during the last two years…

Currently in development for moving to AMD.

aplattner · September 10, 2019, 4:57am

I asked around and it sounds like this issue is being investigated and tracked in bug number 2432712.

tugohugo, your issue sounds different. Do you have a bug number associated with your problem? If not, please file one through the partner site. (If you’re not set up to file bugs I can put you in touch with the developer relations folks)

dumbdog · September 18, 2019, 5:32pm

Great to hear that this bug is already tracked! Is there a public bugtracker where the progress is listed? I couldn’t find one

aplattner · September 18, 2019, 9:49pm

The bug tracker is not public, sorry. This particular bug is still open for investigation.

ntropia · October 23, 2019, 11:56pm

Hi,
I found this bug on a number of our Linux workstation using Nvidia cards.
The error does not seem to be triggered by any program or operation in particular, although we run several OpenGL applications.

At the first occurrence of the error in the syslog (i.e., it would appear in dmesg) the Xorg server is in an unstable state, and all unsaved work is basically lost.

We tried switching cards (three so far), suspecting hardware issues, but the problem persisted.
The latest driver we tested was 418.74.

The state of the Nvidia drivers under Linux is in a terribly sad state, after being rather reliable for some years.

Please let us know if we can provide more feedback to speed up the solution of the problem.

Mounir · November 18, 2019, 2:49pm

i encounter the same issue on vlc .
on other player like qmplay2 kodi …no issue

Mounir · November 26, 2019, 3:13pm

turn off tripple buffering and paste this to xorg.conf : Option “metamodes” “nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}”
another point i disabled serial port on bios .
no freeze anymore no headaches .

dreamcat4 · December 7, 2020, 2:55pm

Hi there, I am having same random problems. No particular app. This is with latest linux kernel v5.9.10 on ubuntu 20.04 with xorg ubuntu budgie. And the 455.45.01 driver. It’s been happening ever since from upgrading 19.10 to 20.04. And with these newer nvidia driver versions.

My hardware is gt1030 (pascal) + 8700k. I have not been able to get into my bios to check the serial port setting yet. Because my apple keyboard doesn’t recognize at boot time.

Please keep us apprised / updated for this issue. And if you can tell us what hardware + software you are trying to reproduce with. Can you reproduce this bug reliably, internally? And narrow / regression test the previous versions? Thanks.

nphyxx · December 12, 2020, 2:10am

Similar problem exists in Cyberpunk 2077 running under Steam Proton on 455.46.02 drivers (also occurred on 455.45.01).

dmesg:

[ 3198.971541] NVRM: Xid (PCI:0000:01:00): 31, pid=86684, Ch 00000046, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_2 faulted @ 0x1_f4fd5000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
# and another ...
 Xid (PCI:0000:01:00): 31, pid=291362, Ch 0000004e, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_8 faulted @ 0x2_0db70000. Fault is of type FAULT_PTE ACCESS_TYPE_READ

I am happy to provide further info if it will help, but I’m not sure if I can get the game to run with debug instrumentation (e.g. cuda-memcheck suggested in documentation).

emerth · January 12, 2021, 2:42pm

Same problem here, running nvCaffe on pair of 2080ti, Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-118-generic x86_64).

Problem only started when I updated to NVIDIA-Linux-x86_64-450.80.02 driver.

jasoncollege24 · February 1, 2021, 3:14am

Kubuntu 20.04 LTS, using nvidia proprietary driver 460.32.03

I was watching videos on Plex, using the web player in the Brave web browser, when my system locked up completely. I was unable to SSH into it from another system, so had to press the reset button.

When I logged back in, I checked the crash logs, and found this…
NVRM: Xid (PCI:0000:01:00): 31, pid=278, Ch 00000002, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x10_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ NVRM: Xid (PCI:0000:01:00): 31, pid=5096, Ch 00000050, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x21_02b07000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ

I’ve been messing with Linux for a good while now, (a few years), and have never seen this before. A google search brought me here. I’m using a Geforce RTX 2080 SUPER. I was trying to migrate to Linux from Windows, but if my expensive hardware won’t work there, because of bad drivers, I’m forced to stick with Windows.

scottwn · April 23, 2021, 12:50am

Same problem here on a Lenovo ThinkPad P53, RHEL8.3
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2

Most videos in Firefox cause a system hang. Sometimes they cause a full lockup requiring me to kill X. Crash logs show the same error.

NVRM: Xid (PCI:0000:01:00): 31, pid=119156, Ch 00000079, intr 00000000. MMU Fault: ENGINE NVDEC0 HUBCLIENT_NVDEC0 faulted @ 0x1_04c43000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ

Knut49 · July 7, 2021, 12:36pm

It is still present with xwayland and nvidia 470.21.

[110909.405762] NVRM: Xid (PCI:0000:65:00): 31, pid=7128, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_02c1f000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[113802.316814] NVRM: Xid (PCI:0000:65:00): 31, pid=184898, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_02cf1000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[119392.277852] NVRM: Xid (PCI:0000:65:00): 31, pid=203435, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_1003e000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[138867.640898] NVRM: Xid (PCI:0000:65:00): 31, pid=208508, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_03ed5000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[145322.554347] perf: interrupt took too long (4917 > 4910), lowering kernel.perf_event_max_sample_rate to 40000
[170380.626783] TCP: enp4s0: Driver has suspect GRO implementation, TCP performance may be compromised.
[186905.349385] NVRM: Xid (PCI:0000:65:00): 31, pid=250138, Ch 00000008, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_02d31000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
knutjb@knut:~/sources$

Nvidia gtx 3070

Topic		Replies	Views
FAULT_PDE ACCESS_TYPE_READ bug still not fixed Linux	5	1589	June 26, 2023
Multiple CUDA/RTX/Vulkan application crashing with Xid (13,109) errors Linux	461	42655	July 11, 2025
Reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100% Linux	24	51073	December 16, 2015
Keep getting "GPU has fallen off the bus" with 3090 cards on Gigabyte MZ32-AR1 Rev 3.0 motherboard Linux gaming	19	381	July 7, 2025
Xid109 CTX SWITCH TIMEOUT Driver Crashes In Many Applications Linux driver , linux-driver-solutions	29	2734	July 5, 2025
Arch linux \| hw: rtx 3070 ti \| driver 510.54-7 \| Display hangs while loading driver \| kernel Oops Drivers - Linux, Windows, MacOS kernel , nvbugs	15	5471	January 3, 2023
Arbitrary Crashes / Segfaults with RTX 3070 on current driver-455 on Ubuntu 20.04 kernel 5.4.0-58-generic Linux	23	2226	February 25, 2021
Random low frame rate intervels no matter how much is running Linux	22	3835	October 27, 2024
High CPU usage on xorg when the external monitor is plugged in Linux	120	38789	June 21, 2023
Reporting graphics driver bugs? General Topics and Other SDKs	22	18418	November 15, 2021

Will the FAULT_PDE ACCESS_TYPE_READ bug in the Nvidia driver ever be fixed?

Related topics