Random Xid 61 and Xorg lock-up

[Unrelated to the immediate prior posts] just writing to confirm that locking my 2060S frequency worked OK.

2020/07/14 13:10:18.022, GeForce RTX 2060 SUPER, 00000000:09:00.0, 440.100, P5, 2, 31, 1005 MHz, 810 MHz, 18.60 W

Also, complete hats off to this Uli1234 guy. You are a hero for doing these tests and identifying this as a potential fix. A little bizzare that, for such a ubiqitous problem, NVIDIA chimes in so infrequently. Certainly has soured my experience

2 Likes

Hi,
I just recently started having this problem. For me it happens almost every day, sometimes 2 times a day.

I’m running only terminal with ssh session, and around 80 tabs in firefox spread around 4 windows.

Spec as you may assume:
Ryzen 3700x
Gigabyte x570 Aorus PRO bios version F12e
Gigabyte RTX 2070 Super
2 monitor setup

Have this same issue with ubuntu 20 (gnome), ryzen 3700x, asus x470, cooler master h100i and two monitors at 2560*1080.
I tried with xfce and it only got worse, way slower/laggy.
Mostly happens when sharing screen with google hangouts or zoom.
I tried with newer kernel versions, older drivers, forcing frequencies and disabling cool and quiet, disabling hardware acceleration in firefox and chrome, none worked
I am forcing powermizer to run at max performance and that seems to be working for me.

What frequency range did you set when locking the GPU?
It seems, you just don’t want the card to go to P8 state, because when exiting from that state, the Xid-61 pops up (randomly). Maybe with your minimum locked frequency P8 was also accessible. Just a thought.
The minimum frequency you have to set could differ from the model of the card and the chip you have on it (2060, 2070 etc.)
In my case the datasheet said 300MHz as the lowest freq, so I just set 1000 as my minimum. If your min freq. is 900Mhz than you might have to set 1200…

Good to know. I must’ve missed that in previous posts. I’m not sure what the minimum clock is for my card so I’ll have to do some digging, or I’ll just slowly increment the low setting until I bump it out of P8.

I’m actually running pretty strong right now with SMT and XMP disabled (>24 hours at this point, which is a first in a while), so I want to try and be methodical about it and run until I get the Xid-61 error. After that I’ll redo the frequency locking.

Thanks for all your input @Uli1234!

We have a reproduction of the problem internally, thank to Uli1234 who provided affected hardware.
We have a root cause. The problem happens when the PCIe Gen switches from 3 to 1, and it is a NVIDIA bug. I’ll update this thread when we have a fix.

10 Likes

Please repay Uli with a gold plated version of the 3080ti.

3 Likes

Just wanted to add that since locking the frequencies I have gone 7 days without seeing this issue. This is longer than the system has ever run before. Thank you @Uli1234 for helping me avoid having an angry customer!

SO glad to hear this. @Uli1234 I’ll happily venmo you some $$$ for your efforts. :-D

Finally!

Awesome effort @Uli1234 - we are all in your debt!

In my case, Xid 61 seems to have been apparently solved (no issue for 11 days) by disabling SMT in bios. I hope this is not a different bug. @ahuillet, could that be related to the gen switch bug you are tracking ?

Config : ASUS X570 Prime Pro, 3800X, RTX 2070

Any way to force PCIe generation 3 permanently for now?

Are other segfaults and errors caused by this as well? I’ve been getting random crashing for awhile now and it always seems to be when the GPU ramps-down either because I’ve exited a game or because the game itself isn’t intensive enough.

Contacted Nvidia about it and was told it was a PSU issue.

1 Like

Please tell me a fix is coming for Windows as well.

I fixed this locally. I apparently had odd modprobe options. Removing them got rid of the issues. System is rock stable again.

Dudeee I need this fix soo badly, I’m getting this error like 2 times a day on my working computer! It is so freaking bad, I have to restart my computer with all my working environment running, aaaaaaahhh.

Hi Carlos,
have you tried the workaround to lock the GPU frequency?
Every time after reboot open a terminal and type:

sudo nvidia-smi -pm ENABLED
sudo nvidia-smi -lgc 1000,2000

You can put those commands in a start-up script as well to avoid typing them in every time you startup the computer.
In the second command, the lower value should hinder your card to go to P8 state (=PCIe Gen1), the higher value should be your graphics card boost frequency. Please try out and report back

workaround:
set in BIOS:
suspend to RAM ->DISABLED;
Global C states Control -> DISABLED
ACPI_CST C1 Declaration -> DISABLED
PCIE Reset Control -> DISABLED

set nvidia-smi pm 1, nvidia-smi lgc 1600,1605 (for 2070S)

Referring to my post from January 2020, do you guys think that I could get my 1660ti to work with such an old computer?

These issues seem to relate to PCIe Gen 1 and I think my motherboard is PCI Gen 1. I can’t get my computer to boot with the card, unless I have the open-source Nouveau-drivers installed on my Archlinux-based machine. Can these clock and power settings be applied to my GRUB-config as kernel parameters somehow?

Thank you, I’ll test this solution today:

After typing these two commands I’m not getting into “level 0”, the lowest level I get now is “level 1”.

—EDIT

After 10 hours of work on the PC today I’ve got no errors! I’ll keep this post updated during each day of the week until Friday. Thank you!

2 Likes