465.24.02 page fault

Thanks for that! Today I learned there is a 32 bit nvidia-utils package that needs to be kept in sync with the others! Well that solves my only problem with holding back the NVIDIA version for a while longer while the actual issue is solved.

Already switched to nouveau (still crashes, but about 10x less often), and spent an inordinate sum on an AMD gpu.

Just one anecdote shouting into the void, but there you go. Customer lost for good.

3 Likes

I tried the AMD route, but was never able to get Davinci Resolve on Linux to work with it. Whenever I selected it in the setup, DR would crash. Swapped it out for a 1660 Super (pre-pandemic) and DR magically worked.

Agreed that this driver issue sucks. I’m holding off on all Manjaro updates until this is resolved.

This afternoon, several earlier Nvidia drivers magically appeared in the Linux Mint Driver Manager. I used this to conveniently roll back from 465.25 to 460.73.01. This appears to have solved my problem, although it is really only a temporary workaround. I can now boot into the OS while my Dell u3818dw monitor is connected via DP.

Although I am very glad to have finally found a workaround to this issue, I am extremely upset that this episode cost me several days of troubleshooting, and it will take a few more days to put my systems back together.

Thinking it was a hardware problem, during the troubleshooting process, I swapped out cables that had been threaded through monitor arms and cable guides under my workbench. Now I need to rethread the cables.

In addition, at other times thinking it was an OS issue, I wiped the OS SSD in the seemingly misbehaving machine and temporarily tried installations of Manjaro and MX Linux. Now I will revert to Linux Mint, but I also need to reinstall all my applications. Fortunately, my data files were on a separate disk and synced with Dropbox.

One remaining issue is that the 460.73.01 driver apparently needs CUDA 11.2, and I had been running CUDA 11.3.

In any event, I trust that Nvidia will soon issue an updated Linux driver free of bugs and that future updates to Nvidia drivers will be carefully checked before being released.

1 Like

Still no fix??? This is ridiculous! It’s not out garage company, it’s NVIDIA for god sake!

Issue has been debugged and fixed internally.
Shall be released publicly in upcoming releases.

10 Likes

Thanks @amrits for letting us know!
By any chance is there any info on what was the specific cause, like was it an inconsistent display port implementation on specific monitors or some such since it seems to effect a subset of monitors? Or was it some weird edge case of some description.
Also do you happen to know which version of the drivers we should be looking to which would likely have this fix included (when they come out)?

3 Likes

Hi @amrits, Kudos !

1 Like

There is going to be a driver coming out tomorrow which should introduce Vulkan DLSS and Proton DLSS, so hopefully the fix is there, and it should given the wording that it will be included in upcoming releases.

1 Like

In which version we can expect it?

Can you try to reproduce it with 470.42.01? I never experienced this issue with my DP monitors, but it would be nice to see if someone can try and reproduce it.

For Manjaro/Arch users should be as easy as using the dkms

For Ubuntu users, goodluck.

This entire thread is people confirming every driver version since 470 (date of release) have the problem, if you’ve never experienced it why are you on the thread for a problem you’ve never had

2 Likes

A quick test with 470.42.01 on arch: it work’s. At least, no crash and I got a usable system. But I do not use CUDA, openCL or 32Bit stuff.

1 Like

Maybe because the entire thread is people confirming issue since 460.80 and 470.42.01 has been shipped after @amrits confirmed issues is solved ?

Ok so on Arch having installed:

  • nvidia-dkms-performance
  • nvidia-utils-performance
  • nvidia-settings-performance
  • opencl-nvidia-performance
  • lib32-nvidia-utils-performance
  • lib32-opencl-nvidia-performance

Giving me a system with:

  • nvidia-dkms-performance 470.42.01-1
  • nvidia-utils-performance 470.42.01-1
  • linux 5.12.11.arch1-1
  • nvidia-settings-performance 470.42.01-1
  • lib32-nvidia-utils-performance 470.42.01-1
  • opencl-nvidia-performance 470.42.01-1
  • lib32-opencl-nvidia-performance 470.42.01-1

Everything seems to work fine:

  • X seems to be working fine
  • Steam launches
  • Native games seem to work (tried Factorio)
  • nvidia-container-toolkit seems to work as well with the annoying passing in devices but still works
sudo docker run --gpus all \
    --device /dev/nvidia0 \
    --device /dev/nvidia-uvm \
    --device /dev/nvidia-uvm-tools \
    --device /dev/nvidiactl \
    -it nvidia/cuda:11.0-base nvidia-smi

I have yet to try a cuda sample as I have nothing on hand ready/ quickly to do so

1 Like

Is there a solution for Ubuntu users ?

On Fedora 34 installing 470.42.01 from rpmfusion rawhide fixes the problem for me.

it works now, as suggested i have installed 450-server drivers

On Ubuntu you can either install the nvidia-driver-460-server which has the older 460.73.01 without the bug, install the beta 470 driver directly from Nvidia or from this PPA Nvidia testing - Do not use on production systems : Alberto Milone

1 Like

Just tested on my end with KDE Neon 20.04 and it finally works!

I needed to go back, however, because of Steam wouldn’t launch. I am glad, however, that it finally was resolved. Hopefully the fix will land on 465, too!