How to stop "irq/110-nvidia" chewing system CPU on Ubuntu?

I have “NVIDIA-Linux-x86_64-525.85.05.run” installed on Ubuntu 22.04 LTS. Everything has been running fine for about 1 year, but today the NVIDIA driver somehow does not get used after rebooting. So I go through the re-install process for the driver, and everything works, but…

top shows irq/110-nvidia is constantly chewing system CPU:

$ top -d 1 -b | egrep "irq.*nvidia"
   1282 root     -51   0       0      0      0 S  18.8   0.0  42:01.41 irq/110-nvidia
   1282 root     -51   0       0      0      0 R  17.8   0.0  42:01.59 irq/110-nvidia
   1282 root     -51   0       0      0      0 S   4.0   0.0  42:01.63 irq/110-nvidia
   1282 root     -51   0       0      0      0 S   1.0   0.0  42:01.64 irq/110-nvidia
   1282 root     -51   0       0      0      0 R  15.8   0.0  42:01.80 irq/110-nvidia
   1282 root     -51   0       0      0      0 S  16.8   0.0  42:01.97 irq/110-nvidia
   1282 root     -51   0       0      0      0 S  16.8   0.0  42:02.14 irq/110-nvidia
   1282 root     -51   0       0      0      0 S  15.8   0.0  42:02.30 irq/110-nvidia
^C

How to diagnose and fix this?

In the meantime I upgraded to the latest driver “NVIDIA-Linux-x86_64-535.129.03.run” – from end of last month – and the top shows irq/110-nvidia is still constantly chewing system CPU, just a little less:

$ top -d 1 -b | egrep "irq.*nvidia"
   1274 root     -51   0       0      0      0 S   0.0   0.0   0:20.61 irq/110-nvidia
   1274 root     -51   0       0      0      0 S   8.8   0.0   0:20.70 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  10.8   0.0   0:20.81 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  10.9   0.0   0:20.92 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:21.03 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:21.14 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:21.25 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:21.36 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:21.47 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:21.58 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  11.8   0.0   0:21.70 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:21.81 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:21.92 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:22.03 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:22.14 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  10.8   0.0   0:22.25 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  11.8   0.0   0:22.37 irq/110-nvidia
   1274 root     -51   0       0      0      0 S   3.0   0.0   0:22.40 irq/110-nvidia
   1274 root     -51   0       0      0      0 S   1.0   0.0   0:22.41 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:22.52 irq/110-nvidia
   1274 root     -51   0       0      0      0 R   9.9   0.0   0:22.62 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:22.73 irq/110-nvidia
   1274 root     -51   0       0      0      0 S   9.8   0.0   0:22.83 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:22.94 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:23.05 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.9   0.0   0:23.16 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:23.27 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  10.8   0.0   0:23.38 irq/110-nvidia
   1274 root     -51   0       0      0      0 R   9.9   0.0   0:23.48 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  10.8   0.0   0:23.59 irq/110-nvidia
^C

Hmmm… after it has been running for a bit longer then it just climbs up to using more system CPU:

$ top -d 1 -b | egrep "irq.*nvidia"
   1274 root     -51   0       0      0      0 S  17.6   0.0   1:20.76 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:20.93 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  16.7   0.0   1:21.10 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:21.27 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:21.44 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:21.61 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:21.78 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:21.95 irq/110-nvidia
   1274 root     -51   0       0      0      0 R  16.7   0.0   1:22.12 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.5   0.0   1:22.29 irq/110-nvidia
   1274 root     -51   0       0      0      0 S  16.7   0.0   1:22.46 irq/110-nvidia
^C

Seems to be a lot of interrupts generated, just under 200 per second. I’m guessing that is causing the system / kernel CPU. Is this normal to have so many? Any workarounds?

$ cat /proc/interrupts | egrep -i nvidia
 110:          0          0          0          0     830977          0          0          0          0          0          0          0          0          0          0          0        543          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-0000:07:00.0    0-edge      nvidia

Another thing I tried:

I found this [1] talking about interrupts, and also found this which mentions the file I have /etc/modprobe.d/nvidia-installer-disable-nouveau.conf. So I edited the file and rebooted trying both settings options nvidia NVreg_EnableMSI=0 and options nvidia NVreg_EnableMSI=1.

And I also tried putting it in /etc/modprobe.d/nvidia.conf too.

But neither settings appears to have any influence on the irq/110-nvidia CPU usage in top :-( And how do I even know that file is being read and used?

I tried this command [3] which shows the parameters given to the nvidia* kernel modules, but there is no sign of NVreg_EnableMSI so how do I know that the parameter in any .conf file was actually taken?

$ sudo grep -H '' /sys/module/nvidia*/parameters/*
/sys/module/nvidia_drm/parameters/modeset:N
/sys/module/nvidia_modeset/parameters/config_file:(null)
/sys/module/nvidia_modeset/parameters/disable_vrr_memclk_switch:N
/sys/module/nvidia_modeset/parameters/fail_malloc:-1
/sys/module/nvidia_modeset/parameters/malloc_verbose:N
/sys/module/nvidia_modeset/parameters/output_rounding_fix:Y
/sys/module/nvidia_uvm/parameters/uvm_ats_mode:1
/sys/module/nvidia_uvm/parameters/uvm_channel_gpfifo_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_gpput_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_num_gpfifo_entries:1024
/sys/module/nvidia_uvm/parameters/uvm_channel_pushbuffer_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_cpu_chunk_allocation_sizes:2166784
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_acquire_info:0
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_desc:0
/sys/module/nvidia_uvm/parameters/uvm_debug_prints:0
/sys/module/nvidia_uvm/parameters/uvm_disable_hmm:N
/sys/module/nvidia_uvm/parameters/uvm_downgrade_force_membar_sys:1
/sys/module/nvidia_uvm/parameters/uvm_enable_builtin_tests:0
/sys/module/nvidia_uvm/parameters/uvm_enable_debug_procfs:0
/sys/module/nvidia_uvm/parameters/uvm_enable_va_space_mm:1
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_peermem:0
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_fault_force_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_force_prefetch_fault_support:0
/sys/module/nvidia_uvm/parameters/uvm_global_oversubscription:1
/sys/module/nvidia_uvm/parameters/uvm_leak_checker:0
/sys/module/nvidia_uvm/parameters/uvm_page_table_location:(null)
/sys/module/nvidia_uvm/parameters/uvm_peer_copy:phys
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_mimc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_momc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_threshold:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_coalesce:1
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_batches_per_service:20
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_throttle_per_service:5
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_policy:2
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_update_put_ratio:50
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_eviction:1
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_native_atomics_fault:0
/sys/module/nvidia_uvm/parameters/uvm_perf_migrate_cpu_preunmap_block_order:2
/sys/module/nvidia_uvm/parameters/uvm_perf_migrate_cpu_preunmap_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_pma_batch_nonpinned_order:6
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_min_faults:1
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_threshold:51
/sys/module/nvidia_uvm/parameters/uvm_perf_reenable_prefetch_faults_lapse_msec:1000
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_epoch:2000
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_lapse_usec:500
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_max_resets:4
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_nap:1
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_pin:300
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_pin_threshold:10
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_threshold:3
/sys/module/nvidia_uvm/parameters/uvm_release_asserts:1
/sys/module/nvidia_uvm/parameters/uvm_release_asserts_dump_stack:0
/sys/module/nvidia_uvm/parameters/uvm_release_asserts_set_global_error:0

According to this [4] I can figure out which .conf files are visited:

$ sudo lsinitramfs /boot/initrd.img | grep etc/modprobe.d
etc/modprobe.d
etc/modprobe.d/alsa-base.conf
etc/modprobe.d/amd64-microcode-blacklist.conf
etc/modprobe.d/blacklist-ath_pci.conf
etc/modprobe.d/blacklist-firewire.conf
etc/modprobe.d/blacklist-framebuffer.conf
etc/modprobe.d/blacklist-modem.conf
etc/modprobe.d/blacklist-nouveau.conf
etc/modprobe.d/blacklist-oss.conf
etc/modprobe.d/blacklist-rare-network.conf
etc/modprobe.d/blacklist.conf
etc/modprobe.d/dkms.conf
etc/modprobe.d/intel-microcode-blacklist.conf
etc/modprobe.d/iwlwifi.conf
etc/modprobe.d/nvidia-installer-disable-nouveau.conf

And [4] also says how to access the parameters, which says that EnableMSI: 0 so maybe the default is zero already? Now we know it does not seem to affect the interrupts and CPU :-(

$ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 0
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 0
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableResizableBar: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: ""
ExcludedGpus: ""

[1] NVIDIA/nvidia-drivers - Gentoo wiki
[2] Ubuntu 14.04 hangs after installing Cuda - #3 by Abhijit-Amagi
[3] kernel - How do I list loaded Linux module parameter values? - Server Fault
[4] https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters

Another thing I tried:

Tried moving the irq CPU affinity from e.g. its default CPU 4 to CPU 30 using this tutorial [1].

First first out the current irq Nvidia is using; 106 in this case:

$ cat /proc/interrupts | grep nvidia
 106:          0          0          0          0     349812          0          0          0          0          0          0          0          0          0          0          0        542          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-IO-APIC   30-fasteoi   nvidia

Then double check irq 106 currently has CPU affinity with CPU 4:

$ cat /proc/irq/106/smp_affinity_list
4

Then set the CPU affinity as desired:

$ sudo sh -c "echo 30 > /proc/irq/106/smp_affinity_list"
$

It moves CPU now! And mysteriously on the new CPU it’s using ~ half as much CPU?! Why?

$ top -d 1 -b | egrep "irq.*nvidia"
   1266 root     -51   0       0      0      0 S   6.2   0.0   8:05.89 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:05.94 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:05.99 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.04 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.09 irq/106-nvidia
   1266 root     -51   0       0      0      0 R   5.9   0.0   8:06.15 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.20 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.25 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.30 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.35 irq/106-nvidia
   1266 root     -51   0       0      0      0 S   5.0   0.0   8:06.40 irq/106-nvidia
^C

[1] Setting IRQ CPU affinities: Improving IRQ performance on the ODROID-XU4 | ODROID Magazine

I guess you moved the irq from a physical core to a virtual one.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

I guess you moved the irq from a physical core to a virtual one.

I think the other CPU is running at about twice the frequency / MHz, so maybe that explains it?

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

nvidia-bug-report.log.gz (661.3 KB)

P.S. The upload feature completely failed for me on Firefox :-( Had to switch to Chrome to upload!

There’s something really wrong, the gpu is only running xorg and gnome-shell, status is ‘idle’, yet it’s running full throttle with 96% gpu load. Doesn’t make sense. Do you have any kind of gpu monitoring tool running that’s calling nvidia-smi or the like in a fast loop?

No, nothing like that. The only process I have runnig is htop, which is how I noticed the irq/106-nvidia gobbling CPU. What I actually want is to have the system idle without using much CPU :-)

I do have the main laptop display and two external monitors. Could that have anything to do with it?

How to debug this further?

Hmmm… if I look at it now then it’s in the 0% to 20% range…

$ while true;  do nvidia-smi --query-gpu=utilization.gpu --format=csv ; sleep 1;  done
utilization.gpu [%]
15 %
utilization.gpu [%]
0 %
utilization.gpu [%]
11 %
utilization.gpu [%]
20 %
utilization.gpu [%]
16 %
utilization.gpu [%]
14 %
utilization.gpu [%]
6 %
utilization.gpu [%]
0 %
utilization.gpu [%]
2 %
utilization.gpu [%]
3 %
utilization.gpu [%]
0 %
utilization.gpu [%]
12 %
utilization.gpu [%]
10 %
utilization.gpu [%]
12 %
utilization.gpu [%]
10 %
utilization.gpu [%]
11 %
utilization.gpu [%]
10 %
utilization.gpu [%]
9 %
^C

Then the output in the logs might have just been a very odd coincidence.
Regarding monitors, this could very well be, a slightly bad connection might trigger this. Please monitor cpu usage, then disconnect one monitor after another and check if anything changes.

More bizarre results: So I went away over the weekend and left the laptop idling. Upon return, the irq/106-nvidia process has decided “on its own” to use less CPU?! But still jumps annoyingly higher from time to time:

$ top -d 1 -b | egrep "irq.*nvidia"
   1259 root     -51   0       0      0      0 S   0.0   0.0  19:36.55 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.57 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.59 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   6.8   0.0  19:36.66 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:36.67 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.69 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.71 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   1.9   0.0  19:36.73 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   6.9   0.0  19:36.80 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.82 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   2.0   0.0  19:36.84 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:36.89 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.9   0.0  19:36.92 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.9   0.0  19:36.94 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.9   0.0  19:36.97 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:36.99 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.01 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.03 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   3.9   0.0  19:37.07 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   3.9   0.0  19:37.11 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:37.12 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.14 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.9   0.0  19:37.16 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.18 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  19:37.24 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:37.25 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   2.0   0.0  19:37.27 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.29 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.9   0.0  19:37.31 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   6.9   0.0  19:37.38 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.40 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.42 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.44 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.9   0.0  19:37.46 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.48 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:37.49 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.51 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.53 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.55 irq/106-nvidia
   1259 root     -51   0       0      0      0 S  10.7   0.0  19:37.66 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:37.67 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.69 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.71 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.73 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:37.78 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.80 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.82 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.84 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.9   0.0  19:37.87 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   9.7   0.0  19:37.97 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:37.99 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   1.0   0.0  19:38.00 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:38.02 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   2.0   0.0  19:38.04 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   2.0   0.0  19:38.06 irq/106-nvidia
^C

And this is without disconnecting any monitors, etc.

Actually… I didn’t tell the entire truth in the last post! I had Firefox running. And when I closed Firefox then mysteriously the irq/106-nvidia process jumped up to its old, higher CPU level:

$ # firefox NOT running
$ top -d 1 -b | egrep "irq.*nvidia"
   1259 root     -51   0       0      0      0 S   6.2   0.0  19:47.64 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   4.9   0.0  19:47.69 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:47.74 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  19:47.80 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:47.85 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:47.90 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:47.95 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   4.9   0.0  19:48.00 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.05 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.10 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.15 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.20 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   3.9   0.0  19:48.24 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.29 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   4.9   0.0  19:48.34 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.39 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  19:48.45 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:48.50 irq/106-nvidia
^C

But if I restart Firefox then there is no mysteriously no CPU change for irq/106-nvidia:

$ top -d 1 -b | egrep "irq.*nvidia"
   1259 root     -51   0       0      0      0 S   5.9   0.0  19:55.96 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.01 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.06 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.11 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.16 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.21 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  19:56.27 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.32 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.37 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.42 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.47 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.52 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.57 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.62 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  19:56.67 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.8   0.0  19:56.73 irq/106-nvidia
^C

I tried unplugging an external monitor, but no difference to irq-106-nvidia CPU can be seen:

$ top -d 1 -b | egrep "irq.*nvidia"
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:02.78 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:02.84 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  20:02.89 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.8   0.0  20:02.95 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:03.01 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  20:03.06 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   5.9   0.0  20:03.12 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:03.18 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  20:03.23 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:03.29 irq/106-nvidia
   1259 root     -51   0       0      0      0 R   4.9   0.0  20:03.34 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   5.9   0.0  20:03.40 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  20:03.45 irq/106-nvidia
   1259 root     -51   0       0      0      0 S   4.9   0.0  20:03.50 irq/106-nvidia
^C

And same story if I unplugging the other monitor.

Anybody have any more ideas on how else to diagnose these issues?

Is there a debug log and how to enable it?