390.87 driver produces excessive IRQs with GeForce GT 730

cfgauss · February 17, 2019, 8:35pm

I have a quad-core x86_64 Gentoo system with a GeForce GT 730 card, PCI ID 0f02, which must use the legacy 390.xx driver. The latest version from portage, 390.87, 1/16/19, appears to cause excessive IRQs. The symptom is that every three minutes, htop reports that CPU0 is at 100% for about ten seconds and the GUI freezes for that time period. htop also reports that the kernel thread ksoftirqd/0 is responsible for this. Right now TIME+ in htop reports 4:42.69 for ksoftirqd/0, 0:00.87 for ksoftirqd/2, 0:28.03 for ksoftirqd/3, and 0:00.74 for ksoftirqd/1. (I’m not using the threadirqs boot option.)

Is this likely to be hardware (video card) failure or a bug in the 390.87 driver from portage?

generix · February 17, 2019, 9:03pm

Use
cat /proc/interrupts
to see where the irqs are coming from.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]

cfgauss · February 17, 2019, 10:21pm

Here’s my cat /proc/interrupts. Attached is my nvidia-bug-report.log.gz.
nvidia-bug-report.log.gz (114 KB)

generix · February 17, 2019, 10:40pm

The gpu is continously running into errors:

[ 16051.968] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[ 16051.968] (EE) NVIDIA(0):     recover...
[ 16051.994] (II) NVIDIA(0): Error recovery was successful.

Kernel:

[13997.629960] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000001
[14005.823040] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000001
[14014.016125] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000001
[14022.209279] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000001

Since this only started after 4h in, looks like some thermal defect, broken hardware.

cfgauss · February 17, 2019, 10:43pm

Many thanks. Off to buy a new card.

cfgauss · February 20, 2019, 11:21pm

I bought a new card which has no errors in Xorg.0.log nor any trace of NVRM in /var/log/messages. Clearly I needed a new card but I still get GUI freezes as one of the four ksoftirqd pushes a core to 100%.

Any idea what could be causing this now?

hussam · February 21, 2019, 10:24am

Some ideas crossed my mind. Are you using suspend or hibernate? If so, do keep the power plug connected when you hibernate? How old is your CMOS battery?

Regardless of the answer to the first question, IRQ problems can also occur if the card is not properly situated in the PCI slot or the system lost connection to the card at some point. How much power does your PSU generate and how much does your graphics card require?

generix · February 21, 2019, 1:48pm

Thinking that ksoftirqd/0 is responsible for cpu#0 and the nvidia gpu is using msi on cpu#1, I’m not sure how to make the nvidia driver responsible for this. Looking at what sits on cpu#0, there are two of your nics (even an old tulip as a bridge interface for virtualbox?) using apic, I don’t think that’s very efficient. Maybe take a look at those nics first.

cfgauss · February 24, 2019, 1:11am

On the motherboard I had two NICs which died so I’m using one on a PCI card. The two dead NICs appeared in /proc/interrupts so I removed the driver as a module in the kernel and now they don’t. For reasons I don’t understand, this seems to have fixed the GUI freezes which were due to the four kernel threads ksoftirqd. I agree that the nvidia driver (with new video card) was not responsible for these freezes.

I’m grateful to generix and HussamT for their advice. Off to install nvidia-drivers-390.116!

generix · February 24, 2019, 2:10pm

Maybe check if you can completely disable the onboard nics in bios. Removing the driver obviously stopped the interrupt storm but you never know what else they’re doing.

cfgauss · February 24, 2019, 9:18pm

Many thanks. I never would have thought of this. Found them both and disabled them in the BIOS.

Topic		Replies	Views
CPU soft lockup(s) on 510.68.02, GTX 1070TI Linux kernel	1	898	October 18, 2022
GTX 460 stability issues? CUDA Programming and Performance	7	6345	August 6, 2011
NVIDIA Driver 340.107 causes System Freeze frequently Linux hw , nvbugs	1	428	July 14, 2020
System hangs with drivers 319.23, 319.32, 325.08 and others - simple test case included Linux	17	9470	July 1, 2014
resume from suspend freezes system (GTX 970, Arch Linux, Kernel 4.4/4.7, NVIDIA 370) Linux	171	58511	June 18, 2017
GeForce GT 730 random colorful X crashes Linux	3	2577	October 13, 2015
How to stop "irq/110-nvidia" chewing system CPU on Ubuntu? Linux ubuntu	15	2010	November 21, 2023
NVIDIA kernel module does not appear to be receiving interrupts generated by the NVIDIA GPU Linux	4	3335	January 15, 2017
Ubuntu 18 kernel 5.0 integrated graphics GF 8300 HDMI output with driver 340.107 freezes. Linux	10	518	November 6, 2019
Nvidia drivers hang in nv_rdtsc on CentOS 7 with Quadro K4000 Linux	2	1044	August 25, 2016

390.87 driver produces excessive IRQs with GeForce GT 730

Related topics