Random Xid 61 and Xorg lock-up

Nvidia can you please pull you finger out on this? How can you expect anyone to buy £1000 cards ever again when they crash their machine every few days?

It was less common for a while but I’ve had it twice in a few days. Currently on 440.82. Most recently shortly after log-in with just Firefox and VSCode open.

Apr 21 11:44:37 shinpan kernel: [ 1192.754783] NVRM: Xid (PCI:0000:0a:00): 61, pid=1843, 0cec(3098) 00000000 00000000
Apr 21 11:44:37 shinpan kernel: [ 1192.754779] NVRM: GPU Board Serial Number: 
Apr 21 11:44:37 shinpan kernel: [ 1192.754776] NVRM: GPU at PCI:0000:0a:00: GPU-5a995746-9836-2529-7692-2a9d80e4fb6c

@tarithil

Thank you for your suggestion. I tried it and unfortunately, I experienced another crash today.

Hi All,

Apologize for the slow response.
Please allow us some more time as I do not have setup (X570 + RTX GPU) handy and due to lock-down, I am not able to configure it.
As soon as lockdown is over, I will prepare setup and attempt for repro.
We checked logs from few users but not observed any information which points to root cause and we need local repro to take it further.

@admin8cqme did you try the following parameter instead?

rcutree.rcu_idle_gp_delay=1

Im using archlinux 5.6.6-arch1-1
nvidia card is gtx 1660
ryzen 5 2600

And I added this parameter to my other linux distributions in grub so to see how it goes.

The other thing that I’m testing is ubuntu 20.04 LTS but with no parameter at grub, so to see if they have fixed it on this new ubuntu distro.

Just want to report that I experienced another crash with @tarithil’s suggestion of using grub parameter:

pcie_aspm=off

I have noticed that probably 90% of the time that it crashes, I am on youtube on chrome, either starting a video, ending a video, or trying to scrub through a video.

My chrome configuration:

/usr/bin/gnome-www-browser --flag-switches-begin --enable-gpu-rasterization --enable-webgl-draft-extensions --enable-webgl2-compute-context --flag-switches-end --disable-webrtc-apm-in-audio-service

I went through a period of systematically experimenting with different chrome GPU configurations at one point (turning all off and turning each one on, one at a time), but no single variable made a difference (and some just seemed to hurt performance).

I will be trying @basdeth’s suggestion of rcutree.rcu_idle_gp_delay=1

Just happened to me today:

abr 29 12:07:17 carlos-tobefilledbyoem kernel: NVRM: GPU at PCI:0000:07:00: GPU-44c5cdee-5572-eb62-6d76-34ba1fa54eb2
abr 29 12:07:17 carlos-tobefilledbyoem kernel: NVRM: GPU Board Serial Number: 
abr 29 12:07:17 carlos-tobefilledbyoem kernel: NVRM: Xid (PCI:0000:07:00): 61, pid=996, 0cec(3098) 00000000 00000000
abr 29 12:09:13 carlos-tobefilledbyoem kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

My GPU info:

NVIDIA Driver Version: 440.82
GeForce GTX 1660 SUPER
Galax GeForce GTX 1660 Super

My CPU:

AMD Ryzen 5 3600

My motherboard:

ASRock B450M Steel Legend

My system configuration:

Operating System: Manjaro Linux 
Kernel Version: 5.6.7-1-MANJARO
OS Type: 64-bit

Bios Version:

BIOS Information
        Vendor: American Megatrends Inc.
        Version: P2.90
        Release Date: 11/27/2019

I was working with Firefox, Visual Studio Code, Sublime, terminals, postgreSQL. When my screen froze, actually my mouse pointer was moving really slowly, but I couldn’t even open the terminal to look at the logs because everything stopped working, I even tried to switch to the tty1 (ctrl+alt+f1) but it didn’t worked, I had to press the shutdown button.

hi @admin8cqme , finally I’m testing this configuration in ubuntu 20.04 LTS:

Kernel: 5.4.0-26-generic
nvidia driver:
modinfo nvidia
filename: /lib/modules/5.4.0-26-generic/updates/dkms/nvidia.ko
alias: char-major-195-*
version: 435.21
supported: external
license: NVIDIA

I installed using : apt install nvidia-driver-435

dpkg -l | grep nvidia-driver-435
ii nvidia-driver-435 435.21-0ubuntu7 amd64 NVIDIA driver metapackage

I have been using chrome with video in 4k more than 2 hour and seems stable.

My xorg,conf is:
path: /etc/X11/

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0” 0 0
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
Option “Xinerama” “0”
EndSection

Section “Files”
EndSection

Section “Module”
Load “dbe”
Load “extmod”
Load “type1”
Load “freetype”
Load “glx”
EndSection

Section “InputDevice”
# generated from default
Identifier “Mouse0”
Driver “mouse”
Option “Protocol” “auto”
Option “Device” “/dev/psaux”
Option “Emulate3Buttons” “no”
Option “ZAxisMapping” “4 5”
EndSection

Section “InputDevice”
# generated from default
Identifier “Keyboard0”
Driver “kbd”
EndSection

Section “Monitor”
# HorizSync source: edid, VertRefresh source: edid
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Samsung LC24RG50”
HorizSync 168.0 - 168.0
VertRefresh 48.0 - 144.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “GeForce GTX 1660”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “Stereo” “0”
Option “nvidiaXineramaInfoOrder” “DFP-2”
Option “metamodes” “1920x1080_144 +0+0 {ForceFullCompositionPipeline=On}”
Option “SLI” “Off”
Option “MultiGPU” “Off”
Option “BaseMosaic” “off”
Option “AllowIndirectGLXProtocol” “off”
Option “TripleBuffer” “on”
SubSection “Display”
Depth 24
EndSubSection
EndSection

You have to adjust it according to your graphic card and monitor frequency.
you can use to adjust via nvidia-settings your xorg.conf and save it to /etc/X11/

I hope it helps. Let me know…

AH! and also I upgraded from 19.10 to 20.04 LTS with no problems. I’m testing this with kernel 5.4.0-28-generic and the same nvidia driver 435.

ubuntu-drivers list

then apt install nvidia-driver-435

Hi @carlosmerces I’m testing Archlinux, details:

kernel: 5.6.7-arch1-1
nvidia-driver:
nvidia-dkms 440.82-1
nvidia-settings 440.82-1
nvidia-utils 440.82-1

kde plasma enviroment, and still don’t have the issue, I will keep on testing it.
try it if you wish.

Hi @smooz, Try ubuntu 20.04 LTS kernel 5.4.0-28-generic with nvidia-driver-435 and see how it goes, what OS are you using?
kernel and nvidia driver?

Hi @basdeth,

Thank you for all the work you seem to be putting into this. I’m still currently using your previous suggestion of appending rcutree.rcu_idle_gp_delay=1 to my grub config.

So far, I have not experienced any crashes although it has only been two days. I have previously gone a full week without a crash, so I will probably need to wait an entire month before I begin feeling confident that I have found a fix.

I can try your suggestions if it crashes again.

Hi @admin8cqme, I did the last suggestions to use 20.04 LTS and nvidia-driver 435.21 that I told you because rcutree.rcu_idle_gp_delay=1 did not work at last, my system crashed with the same bug Xid 61, again and again.
I will post in case the combination of 20.04lts and nvidia driver 435.21 do not work.

Thanks, and take care!!

Experienced the Xid 61 with Ubuntu 19.10.

On April 25th I updated to Ubuntu 20.04 with NVidia 435.
Since the issue happened again multiple times the first few days I switched to NVidia 440.

There is no improvement, the Xid 61 error happened 3 times today alone.
Usually it happens once a day at the least.

It’s really annoying…


Ryzen 9 3900x
GeForce RTX 2070 SUPER
ROG STRIX X570-E GAMING

BIOS Information:
Vendor: American Megatrends Inc.
Version: 1201
BIOS Revision: 5.14
Release Date: 2019/10/07

I’ve also seen the Xid 61 issue:

  • Threadripper 3960x
  • Asus ROG Strix TRX40-E Gaming
  • EVGA RTX 2080 Ti
  • Ubuntu 18.04.4 LTS
  • kernel 5.3.0-51-generic
  • Nvidia driver 440.82

It seems to happen about once a week.

xid 61 again - 4th time in 24h…

sometimes it comes with xid 8

May 03 06:02:22 kernel: NVRM: GPU at PCI:0000:0b:00: GPU-cc6e3660-8db4-9431-a0ae-3355f10ac81c
May 03 06:02:22 kernel: NVRM: GPU Board Serial Number:
May 03 06:02:22 kernel: NVRM: Xid (PCI:0000:0b:00): 61, pid=1445, 0cec(3098) 00000000 00000000
May 03 06:02:33 kernel: NVRM: Xid (PCI:0000:0b:00): 8, pid=1445, Channel 00000036

Did anyone try Nouveau?

I’m inclined to give up nvidia altogether …

Ran into the same issue using the latest 440.82 drivers. Crash happened whilst playing a youtube video on chrome and playing OSRS (java client). Has happened 3 times so far in 7 days.

Specs:

  • Arch linux, kernel 5.6.7-arch1-1

  • nvidia driver 440.82

  • Ryzen 9 3900X

  • RTX 2070 Super

  • Gigabyte X570 AORUS Elite Wifi motherboard (F11)

  • RAM 32G

    May 03 00:20:16 Borg kernel: NVRM: GPU at PCI:0000:09:00: GPU-e73dcb30-f7a5-77e0-9eb6-b6fb76b75ec9
    May 03 00:20:16 Borg kernel: NVRM: GPU Board Serial Number:
    May 03 00:20:16 Borg kernel: NVRM: Xid (PCI:0000:09:00): 61, pid=826, 0cec(3098) 00000000 00000000
    #1 0x00007feb57d759fc n/a (libnvidia-glcore.so.440.82 + 0x12d99fc)
    #2 0x00007feb57d769e7 n/a (libnvidia-glcore.so.440.82 + 0x12da9e7)
    #3 0x00007feb57d78e8c n/a (libnvidia-glcore.so.440.82 + 0x12dce8c)
    #4 0x00007feb57966133 n/a (libnvidia-glcore.so.440.82 + 0xeca133)
    #5 0x00007feb57a2ad80 n/a (libnvidia-glcore.so.440.82 + 0xf8ed80)
    #6 0x00007feb57a3594a n/a (libnvidia-glcore.so.440.82 + 0xf9994a)
    #7 0x00007feb579dda2d n/a (libnvidia-glcore.so.440.82 + 0xf41a2d)
    #8 0x00007feb578d40ce n/a (libnvidia-glcore.so.440.82 + 0xe380ce)

nvidia-bug-report.log (1.4 MB)

@amrits would SSH access be helpful at this point when I can reproduce again, are are we beyond that point already?

same problem here:

  • 3900X

  • Gigabyte X570 Aorus Ultra (bios F12e)

  • 64GB DDR Gskill (tried at 3600 and now without XMP a 2133, same pb)

  • MSI 2080 Super

  • Archlinux 5.6.8

  • nvidia drivers 440.82

This a a new build, the problem started occuring at day +1 after installation.
It seems it occured more often using inkscape.
Since I needed to work, I switched back to nouveau, works fine with nouveau since.
(before swithing, done some games under linux without problems ; and gaming under windows are fine too)

I am also willing to offer up ssh access when it happens again if someone might find that helpful. Would just need to know who to contact.

My computer has just entered this state. The link below is to my nvidia-bug-report.log.gz

https://transfer.sh/MSR37/nvidia-bug-report.log.gz

I can leave my computer in this state for the rest of the day. Anyone from NVIDIA, please let me know if you would like ssh access to help diagnose this issue. I will probably have to restart this computer by tomorrow.

We had earlier took remote session for one of the user but it was not much of significance.
I have been trying lately to reproduce issue locally on MSI X570 but no luck so far.
Most of the users have issue on Gigabyte motherboard but unfortunately due to covid-19, I can not access it at this moment.
I have been trying and syncing with other nvidia premises members to get hold of gigabyte m/b.
Hopefully I will be able to get in coming week and then try to reproduce issue locally.

Hey Amrits,

Most of the users have issue on Gigabyte motherboard but unfortunately due to covid-19, I can not access it at this moment.

In the UK, there has been steady stock of various Gigabyte x570 motherboards on Amazon and Ebuyer with next day delivery. A few £100 seems like a drop in the ocean to help diagnose this?

Where are you based? I’m happy to help source one.

1 Like