Any way to force PCIe generation 3 permanently for now?
Are other segfaults and errors caused by this as well? I’ve been getting random crashing for awhile now and it always seems to be when the GPU ramps-down either because I’ve exited a game or because the game itself isn’t intensive enough.
Contacted Nvidia about it and was told it was a PSU issue.
Please tell me a fix is coming for Windows as well.
I fixed this locally. I apparently had odd modprobe options. Removing them got rid of the issues. System is rock stable again.
Dudeee I need this fix soo badly, I’m getting this error like 2 times a day on my working computer! It is so freaking bad, I have to restart my computer with all my working environment running, aaaaaaahhh.
have you tried the workaround to lock the GPU frequency?
Every time after reboot open a terminal and type:
sudo nvidia-smi -pm ENABLED sudo nvidia-smi -lgc 1000,2000
You can put those commands in a start-up script as well to avoid typing them in every time you startup the computer.
In the second command, the lower value should hinder your card to go to P8 state (=PCIe Gen1), the higher value should be your graphics card boost frequency. Please try out and report back
set in BIOS:
suspend to RAM ->DISABLED;
Global C states Control -> DISABLED
ACPI_CST C1 Declaration -> DISABLED
PCIE Reset Control -> DISABLED
set nvidia-smi pm 1, nvidia-smi lgc 1600,1605 (for 2070S)
Referring to my post from January 2020, do you guys think that I could get my 1660ti to work with such an old computer?
These issues seem to relate to PCIe Gen 1 and I think my motherboard is PCI Gen 1. I can’t get my computer to boot with the card, unless I have the open-source Nouveau-drivers installed on my Archlinux-based machine. Can these clock and power settings be applied to my GRUB-config as kernel parameters somehow?
Thank you, I’ll test this solution today:
After typing these two commands I’m not getting into “level 0”, the lowest level I get now is “level 1”.
After 10 hours of work on the PC today I’ve got no errors! I’ll keep this post updated during each day of the week until Friday. Thank you!
Doesn’t work on all GPUs
My system crashes with Xid 61 after a few days. If I set
nvida-smi -lgc 1000,2145, I also get an additional Xid 38 after Xid 61. X stays frozen when these two happen contemporarily, and I am unable to use any window. If only Xid 61 happens, I can still use some windows and close programs using the GPU, but no way to reset the GPU since nvidia-smi says it is in use (X); if I close X, monitor goes off without the GPU returning, forced to reboot. I have seen the errors within intervals of 4, 5, 7, 8, and up to 10 days. Leaving a terminal with the following command:
nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.current,temperature.gpu,clocks.gr,clocks.mem,power.draw --format=csv -l 60
Shows me that the card never goes to P8, the lasts states are P5 with gen 2. I will now increase the lgc to 1300,2145 and decrease the nvidia-smi query command’s interval to 10 seconds to see how it reacts.
Here is my system information:
Motherboard: TUF GAMING X570-PLUS
CPU: AMD Ryzen 7 3700X
GPU: NVIDIA GeForce RTX 2060 SUPER
Kernel/OS: Linux 5.6.16/Gentoo
I’m another affected user.
OS: Kubuntu 20.04 x86_64 5.4.0-40-generic
GFX: NVIDIA GeForce 2070 Super (MSI Gaming X Trio)
Driver: 440.100 CUDA Version: 10.2
Motherboard: ASRock Taichi TRX40
Processor: AMD 3960X
Monitor: Acer CB280HK (Displayport 4K 60hz)
Symptoms are the usual:
- happens spontaneously, sometimes days/weeks without problems
- after bug hits I can still ssh in without problem
- usually one process is pegged at 100%
- nvidia-smi and any other GPU related process hangs if started
- only cold reboot fixes the problem
- Xid 61, sometimes followed by an Xid 8
I can provide dmesg,syslog and kernel logs for at least 11 occasions. As for frequency of the bug, this month alone on July 7, 8, 9, 15, 20, 21, 22 and 2x on 23.
Since today I’m on the “nvidia-settings -a [gpu:0]/GPUPowerMizerMode=1” regime and hoping for the best, though I’m not thrilled about power consumption in this mode.
Gigantic thanks to @Uli1234, you are the man!
I have a Ryzen 9 3900x on Asus Pro WS x570-ACE. I have flashed the BIOS to the latest version (2103, released on 29/06/2020 by Asus) and even without changing the base clock, it may solve the problem (4th day w/o crash).  I’m on Ubuntu 20.04 with everything very up to date, and NVIDIA-440.100 for my GTX 1650 Super (from Gigabyte).
Nvidia-settings reports “current pcie link speed”, jumping from 2.5GT/s to 8.0GT/s.
ps: for me, if I didn’t use video (youtube, VLC, etc.) it didn’t crash (rapidly?). It almost always crashed while playing a video.
I can’t update my previous post, so I will post the update here:
I worked 10 hours with my computer using the solution proposed by @Uli1234 during Wednesday, Thursday and Friday and I got no errors. I think this temporary solution works for my current setup.
My previous message with my setup: https://forums.developer.nvidia.com/t/random-xid-61-and-xorg-lock-up/79731/150
If you’re still experiencing hangs, even after locking your GPU’s clock frequencies, keep an eye out for audio (maybe others) drivers/modules/etc. that may also mess with the power state of the GPU.
In my case, I was getting Xid 61 after other seemingly unrelated reports from “snd_hda_intel” which attempts to auto discover and configure audio sources (https://docs.slackware.com/howtos:hardware:audio_and_snd-hda-intel). But they would always occur together, so after about the 3rd reboot, I started getting suspicious.
One of the other things snd_hda_intel apparently does is attempt to put audio devices to sleep to save power, which, given that these errors and hangs we’re experiencing are related to switching power modes, seemed like a likely culprit.
I’ve since added the file /etc/modprobe.d/audio_disable_powersave.conf that just has the body “options snd_hda_intel power_save_controller=N”. You can also run
echo N > /sys/module/snd_hda_intel/parameters/power_save_controller
as root however that will likely reset after a reboot.
Going on 3 days now (FINALLY) without having to reboot the machine due to Xid 61 stuff.
Thanks for that useful information. Since audio is often integrated within the graphics card it makes sense to have a look at that module too.
Hi, I am not sure if the PowerMizer setting really prevents low power modes (=switching to PCIe Gen1) or if it just is a preference. I would trust more the locking of the GPU frequency. If the issue occurs again I would try that as a next step.
Did the min freq of 1300MHz worked for you?
I still haven’t gotten the error, and my system has been up for 4 days. Will report back if anything happens.
You can give your card a frequency range that it can work within. It just shouldn’t switch down to PICe gen1 or P8 state.
Locking it at 1600 is, in my opinion, an option but not the best one (regarding power consumption and heat). I would try to go with 1000-1800 or 1300 -2000. I think the exact range differs from the model you have. You can play around a bit with the settings