465.24.02 page fault

You can try to apt-mark hold the nvidia driver package.
See: https://www.cyberciti.biz/faq/apt-get-hold-back-packages-command/ for more info.

Or you can install the working drivers manually after having downloaded them from nvidia website. The only drawback is that you have to uninstall them prior to any kernel update and reinstall them afterwards. However, I’d be interested to know if the apt-mark way works

Can I hold a previous version with apt? Whenever I install, apt always install the latest.

The nvidia-driver-460-server packages in the Ubuntu repositries are on 460.73.01
Install those instead of nvidia-driver-460 if you are hit by this bug

See Comment #9 : Bug #1930733 : Bugs : nvidia-graphics-drivers-465 package : Ubuntu

1 Like

apt mark hold ${packagesList} worked well for me.

Thank you so much! I really did not wish to stick with the manual installation method for too long!

I just tried the 460 server driver solution. This did not work for me and I still have the display port bug.

I have an AMD 3800X, 64 gb ram, rtx 2080 super, fresh install of Ubuntu 20.04 kernel 5.8.0-55, with 3 identical 4K monitors, (1 on HDMI, 2 on DP). The opensource driver continues to work just fine, but we need CUDA for our science to continue.

Here are the outputs from nvidia-smi, uname -r, and xrandr. Note that all monitors should have 4K options.

(base) x@x:~$ nvidia-smi
Wed Jun  9 17:18:16 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:0E:00.0  On |                  N/A |
|  0%   45C    P5    17W / 250W |    702MiB /  7981MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       918      G   /usr/lib/xorg/Xorg                198MiB |
|    0   N/A  N/A      1466      G   /usr/lib/xorg/Xorg                349MiB |
|    0   N/A  N/A      1603      G   /usr/bin/gnome-shell              137MiB |
|    0   N/A  N/A      2014      G   gnome-control-center                2MiB |
+-----------------------------------------------------------------------------+
(base) x@x:~$ uname -r
5.8.0-55-generic
(base) x@x:~$ 
(base) x@x:~$ xrandr
Screen 0: minimum 8 x 8, current 7680 x 2160, maximum 32767 x 32767
DP-0 disconnected (normal left inverted right x axis y axis)
DP-1 connected 1920x1080+3840+0 (normal left inverted right x axis y axis) 1110mm x 620mm
   1920x1080     60.00 +  59.94    29.97*   23.98  
   1680x1050     59.95  
   1440x900      59.89  
   1360x768      60.02  
   1280x1024     75.02    60.02  
   1280x960      60.00  
   1280x800      59.81  
   1280x720      60.00    59.94    29.97    23.98  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32  
   720x480       59.94  
   640x480       75.00    72.81    59.94    59.93  
HDMI-0 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 800mm x 450mm
   3840x2160     30.00*+  59.94    29.97    23.98    23.98  
   4096x2160     59.94    29.97    24.00    23.98  
   1920x1080     60.00    59.94    29.97    23.98  
   1680x1050     59.95  
   1440x900      59.89  
   1360x768      60.02  
   1280x1024     75.02    60.02  
   1280x960      60.00  
   1280x800      59.81  
   1280x720      60.00    59.94    29.97    23.98  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32  
   720x480       59.94  
   640x480       75.00    72.81    59.94    59.93  
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 connected 1920x1080+5760+0 (normal left inverted right x axis y axis) 800mm x 450mm
   1920x1080     60.00*+  59.94    29.97    23.98  
   1680x1050     59.95  
   1440x900      59.89  
   1360x768      60.02  
   1280x1024     75.02    60.02  
   1280x960      60.00  
   1280x800      59.81  
   1280x720      60.00    59.94    29.97    23.98  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32  
   720x480       59.94  
   640x480       75.00    72.81    59.94    59.93  
USB-C-0 disconnected (normal left inverted right x axis y axis)




(Edited to emphasize that issue persists)

I’ve done the following…

apt-mark hold nvidia-driver-460
apt-mark hold nvidia-driver-460-server

Shall see what goes down - My upgrades seem to be moaning at me with a “Partial Upgrade” warning since I rolled back the 5.8.0-55 kernel (which I can’t even get to a bloody terminal)

I’ve started to see the same issue yesterday and could reproduce it under these conditions

OS: Ubuntu 18.04 and 20.04 with both LTS and HWE kernels (currently running 5.4.0-74-generic with Ubuntu 20.04)
Driver: >= 460.80
GPU: GeForce GTX 1050, 760 and 660 Ti
Monitor: Lenovo P27h-10 (resolution 2560x1440)
CPU: Intel Core i7-4770
MB: Asus Z87M-PLUS

If the screen is connected via DisplayPort, the problem occurs, if connected via HDMI no problem. I can boot the system in recovery mode, but as soon as I run nvidia-smi it locks up. So this doesn’t even seem to be related to the X driver, since X is not running in recovery mode. But even more surprising that it only happens when the screen is connected to the DisplayPort output, even with a single screen. I’ve also tried to boot with the screen connected via HDMI, and then connect the second screen via DP after X is running, system locks up in that case too.

Currently I’m using the nvidia-driver-460-server package (driver version 460.73.01) and that works fine.

Yes, the problem is with the NVIDIA driver, not with X, and NVIDIA are aware of it and working on a bug fix. So they are to blame for the bug, and canonical for forcing this driver upon their users despite it being bugged. Canonical suggested solution is to use nvidia-driver-460-server package instead since they do not seem ok for rolling back the buggy driver from their repositories.

Well the point I was trying to make is that there is the Nvidia kernel module and the Nvidia X driver which are both part of the “Nvidia driver”, and that this appears to be a problem in the former and not the latter.

1 Like

Oh ok, sorry, I misunderstood, I did not dive that deep into the problem ;)

I tried the nvidia-driver-460-server solution. Sadly it did not work for me :(

Display port connected monitor resolution is not correct with 460.73.01 driver version.

Incorrect resolution isn’t the issue in this thread, it’s driver crashing on any gpu using display port

You guys trying to go back to 460.73 might want to install the dkms version of it. The drivers and kernel usually upgrade it lockstep so when you start mixing and matching kernels and driver versions you can get funky results. dkms builds the driver using the headers for whatever kernel you are on. Stands for dynamic kernel module system I think.

IDK if ubuntu/debian stuff has a dkms version available or not.

1 Like

Hello, PopOS is on 460.73 nvidia driver version, seems that they are waiting until completely safe update. Anyone with this distro might not have any problem.

@amrits you mentioned that the issue is being worked on more than a week ago. Do you have any more information at this point? This is a severe regression that made its way into several major distributions, so it would be good to know if there will be a fix anytime soon.

3 Likes

I agree that this is a severe regression and that it needs to be fixed asap.
However, I think that the ones who shall take immediate action here are the maintainers of the distributions, since they are supposed to be the “shock absorbers” in these cases.

ESPECIALLY when the issue was already known from the previews in the testing branches: it has been unfortunate, if not irresponsible, that despite all the red flags that were raised they nevertheless made these drivers available to stable branches.

I would first blame Canonical, Manjaro, Arch & the others rather than push nvidia for a solution.
Nvidia may or may not be cooperative with the OSS community, but they are debugging something that is likely to be assembly language, and not high-level language… doesn’t look like to be trivial.

1 Like

Yeah absolute madness to assume that a stable driver release from Nvidia does not contain severe regressions /irony_off . Always these fanboys that jump in to protect the reputation of their beloved company… I don’t care who to blame, I would just like to see better communication from the only party here that has insight into the code. On such a severe bug I would expect that they pull back the buggy drivers, instead I’m still offered 460.84 on the driver download page.

3 Likes

On such a severe bug I would expect that they pull back the buggy drivers, instead I’m still offered 460.84 on the driver download page.

Please, separation of concerns:

  1. not all users are affected: why should nvidia remove the possibility to download the drivers for everybody?
  2. package maintainers knew about the issue since the time the drivers landed into the testing branches, they were told not to promote them to stable, and they did that nevertheless.

Now, who is doing the worst communication here?

And, btw:

Always these fanboys that jump in to protect the reputation of their beloved company

How can nvidia be the beloved company of a Linux user?!