I have “NVIDIA-Linux-x86_64-525.85.05.run” installed on Ubuntu 22.04 LTS. Everything has been running fine for about 1 year, but today the NVIDIA driver somehow does not get used after rebooting. So I go through the re-install process for the driver, and everything works, but…
top
shows irq/110-nvidia
is constantly chewing system CPU:
$ top -d 1 -b | egrep "irq.*nvidia"
1282 root -51 0 0 0 0 S 18.8 0.0 42:01.41 irq/110-nvidia
1282 root -51 0 0 0 0 R 17.8 0.0 42:01.59 irq/110-nvidia
1282 root -51 0 0 0 0 S 4.0 0.0 42:01.63 irq/110-nvidia
1282 root -51 0 0 0 0 S 1.0 0.0 42:01.64 irq/110-nvidia
1282 root -51 0 0 0 0 R 15.8 0.0 42:01.80 irq/110-nvidia
1282 root -51 0 0 0 0 S 16.8 0.0 42:01.97 irq/110-nvidia
1282 root -51 0 0 0 0 S 16.8 0.0 42:02.14 irq/110-nvidia
1282 root -51 0 0 0 0 S 15.8 0.0 42:02.30 irq/110-nvidia
^C
How to diagnose and fix this?
In the meantime I upgraded to the latest driver “NVIDIA-Linux-x86_64-535.129.03.run” – from end of last month – and the top
shows irq/110-nvidia
is still constantly chewing system CPU, just a little less:
$ top -d 1 -b | egrep "irq.*nvidia"
1274 root -51 0 0 0 0 S 0.0 0.0 0:20.61 irq/110-nvidia
1274 root -51 0 0 0 0 S 8.8 0.0 0:20.70 irq/110-nvidia
1274 root -51 0 0 0 0 R 10.8 0.0 0:20.81 irq/110-nvidia
1274 root -51 0 0 0 0 R 10.9 0.0 0:20.92 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:21.03 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:21.14 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:21.25 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:21.36 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:21.47 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:21.58 irq/110-nvidia
1274 root -51 0 0 0 0 S 11.8 0.0 0:21.70 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:21.81 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:21.92 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:22.03 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:22.14 irq/110-nvidia
1274 root -51 0 0 0 0 R 10.8 0.0 0:22.25 irq/110-nvidia
1274 root -51 0 0 0 0 S 11.8 0.0 0:22.37 irq/110-nvidia
1274 root -51 0 0 0 0 S 3.0 0.0 0:22.40 irq/110-nvidia
1274 root -51 0 0 0 0 S 1.0 0.0 0:22.41 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:22.52 irq/110-nvidia
1274 root -51 0 0 0 0 R 9.9 0.0 0:22.62 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:22.73 irq/110-nvidia
1274 root -51 0 0 0 0 S 9.8 0.0 0:22.83 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:22.94 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:23.05 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.9 0.0 0:23.16 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:23.27 irq/110-nvidia
1274 root -51 0 0 0 0 S 10.8 0.0 0:23.38 irq/110-nvidia
1274 root -51 0 0 0 0 R 9.9 0.0 0:23.48 irq/110-nvidia
1274 root -51 0 0 0 0 R 10.8 0.0 0:23.59 irq/110-nvidia
^C
Hmmm… after it has been running for a bit longer then it just climbs up to using more system CPU:
$ top -d 1 -b | egrep "irq.*nvidia"
1274 root -51 0 0 0 0 S 17.6 0.0 1:20.76 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:20.93 irq/110-nvidia
1274 root -51 0 0 0 0 R 16.7 0.0 1:21.10 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:21.27 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:21.44 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:21.61 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:21.78 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:21.95 irq/110-nvidia
1274 root -51 0 0 0 0 R 16.7 0.0 1:22.12 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.5 0.0 1:22.29 irq/110-nvidia
1274 root -51 0 0 0 0 S 16.7 0.0 1:22.46 irq/110-nvidia
^C
simonhf
November 9, 2023, 12:42am
4
Seems to be a lot of interrupts generated, just under 200 per second. I’m guessing that is causing the system / kernel CPU. Is this normal to have so many? Any workarounds?
$ cat /proc/interrupts | egrep -i nvidia
110: 0 0 0 0 830977 0 0 0 0 0 0 0 0 0 0 0 543 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:07:00.0 0-edge nvidia
Another thing I tried:
I found this [1] talking about interrupts, and also found this which mentions the file I have /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
. So I edited the file and rebooted trying both settings options nvidia NVreg_EnableMSI=0
and options nvidia NVreg_EnableMSI=1
.
And I also tried putting it in /etc/modprobe.d/nvidia.conf
too.
But neither settings appears to have any influence on the irq/110-nvidia
CPU usage in top
:-( And how do I even know that file is being read and used?
I tried this command [3] which shows the parameters given to the nvidia*
kernel modules, but there is no sign of NVreg_EnableMSI
so how do I know that the parameter in any .conf
file was actually taken?
$ sudo grep -H '' /sys/module/nvidia*/parameters/*
/sys/module/nvidia_drm/parameters/modeset:N
/sys/module/nvidia_modeset/parameters/config_file:(null)
/sys/module/nvidia_modeset/parameters/disable_vrr_memclk_switch:N
/sys/module/nvidia_modeset/parameters/fail_malloc:-1
/sys/module/nvidia_modeset/parameters/malloc_verbose:N
/sys/module/nvidia_modeset/parameters/output_rounding_fix:Y
/sys/module/nvidia_uvm/parameters/uvm_ats_mode:1
/sys/module/nvidia_uvm/parameters/uvm_channel_gpfifo_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_gpput_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_channel_num_gpfifo_entries:1024
/sys/module/nvidia_uvm/parameters/uvm_channel_pushbuffer_loc:auto
/sys/module/nvidia_uvm/parameters/uvm_cpu_chunk_allocation_sizes:2166784
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_acquire_info:0
/sys/module/nvidia_uvm/parameters/uvm_debug_enable_push_desc:0
/sys/module/nvidia_uvm/parameters/uvm_debug_prints:0
/sys/module/nvidia_uvm/parameters/uvm_disable_hmm:N
/sys/module/nvidia_uvm/parameters/uvm_downgrade_force_membar_sys:1
/sys/module/nvidia_uvm/parameters/uvm_enable_builtin_tests:0
/sys/module/nvidia_uvm/parameters/uvm_enable_debug_procfs:0
/sys/module/nvidia_uvm/parameters/uvm_enable_va_space_mm:1
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_peermem:0
/sys/module/nvidia_uvm/parameters/uvm_exp_gpu_cache_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_fault_force_sysmem:0
/sys/module/nvidia_uvm/parameters/uvm_force_prefetch_fault_support:0
/sys/module/nvidia_uvm/parameters/uvm_global_oversubscription:1
/sys/module/nvidia_uvm/parameters/uvm_leak_checker:0
/sys/module/nvidia_uvm/parameters/uvm_page_table_location:(null)
/sys/module/nvidia_uvm/parameters/uvm_peer_copy:phys
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_mimc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_momc_migration_enable:-1
/sys/module/nvidia_uvm/parameters/uvm_perf_access_counter_threshold:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_batch_count:256
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_coalesce:1
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_batches_per_service:20
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_max_throttle_per_service:5
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_policy:2
/sys/module/nvidia_uvm/parameters/uvm_perf_fault_replay_update_put_ratio:50
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_eviction:1
/sys/module/nvidia_uvm/parameters/uvm_perf_map_remote_on_native_atomics_fault:0
/sys/module/nvidia_uvm/parameters/uvm_perf_migrate_cpu_preunmap_block_order:2
/sys/module/nvidia_uvm/parameters/uvm_perf_migrate_cpu_preunmap_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_pma_batch_nonpinned_order:6
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_min_faults:1
/sys/module/nvidia_uvm/parameters/uvm_perf_prefetch_threshold:51
/sys/module/nvidia_uvm/parameters/uvm_perf_reenable_prefetch_faults_lapse_msec:1000
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_enable:1
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_epoch:2000
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_lapse_usec:500
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_max_resets:4
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_nap:1
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_pin:300
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_pin_threshold:10
/sys/module/nvidia_uvm/parameters/uvm_perf_thrashing_threshold:3
/sys/module/nvidia_uvm/parameters/uvm_release_asserts:1
/sys/module/nvidia_uvm/parameters/uvm_release_asserts_dump_stack:0
/sys/module/nvidia_uvm/parameters/uvm_release_asserts_set_global_error:0
According to this [4] I can figure out which .conf
files are visited:
$ sudo lsinitramfs /boot/initrd.img | grep etc/modprobe.d
etc/modprobe.d
etc/modprobe.d/alsa-base.conf
etc/modprobe.d/amd64-microcode-blacklist.conf
etc/modprobe.d/blacklist-ath_pci.conf
etc/modprobe.d/blacklist-firewire.conf
etc/modprobe.d/blacklist-framebuffer.conf
etc/modprobe.d/blacklist-modem.conf
etc/modprobe.d/blacklist-nouveau.conf
etc/modprobe.d/blacklist-oss.conf
etc/modprobe.d/blacklist-rare-network.conf
etc/modprobe.d/blacklist.conf
etc/modprobe.d/dkms.conf
etc/modprobe.d/intel-microcode-blacklist.conf
etc/modprobe.d/iwlwifi.conf
etc/modprobe.d/nvidia-installer-disable-nouveau.conf
And [4] also says how to access the parameters, which says that EnableMSI: 0
so maybe the default is zero already? Now we know it does not seem to affect the interrupts and CPU :-(
$ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 0
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 0
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableResizableBar: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 0
DmaRemapPeerMmio: 1
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: ""
ExcludedGpus: ""
[1] NVIDIA/nvidia-drivers - Gentoo wiki
[2] Ubuntu 14.04 hangs after installing Cuda - #3 by Abhijit-Amagi
[3] kernel - How do I list loaded Linux module parameter values? - Server Fault
[4] https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters
Another thing I tried:
Tried moving the irq CPU affinity from e.g. its default CPU 4 to CPU 30 using this tutorial [1].
First first out the current irq Nvidia is using; 106 in this case:
$ cat /proc/interrupts | grep nvidia
106: 0 0 0 0 349812 0 0 0 0 0 0 0 0 0 0 0 542 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC 30-fasteoi nvidia
Then double check irq 106 currently has CPU affinity with CPU 4:
$ cat /proc/irq/106/smp_affinity_list
4
Then set the CPU affinity as desired:
$ sudo sh -c "echo 30 > /proc/irq/106/smp_affinity_list"
$
It moves CPU now! And mysteriously on the new CPU it’s using ~ half as much CPU?! Why?
$ top -d 1 -b | egrep "irq.*nvidia"
1266 root -51 0 0 0 0 S 6.2 0.0 8:05.89 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:05.94 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:05.99 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.04 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.09 irq/106-nvidia
1266 root -51 0 0 0 0 R 5.9 0.0 8:06.15 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.20 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.25 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.30 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.35 irq/106-nvidia
1266 root -51 0 0 0 0 S 5.0 0.0 8:06.40 irq/106-nvidia
^C
[1] Setting IRQ CPU affinities: Improving IRQ performance on the ODROID-XU4 | ODROID Magazine
I guess you moved the irq from a physical core to a virtual one.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
I guess you moved the irq from a physical core to a virtual one.
I think the other CPU is running at about twice the frequency / MHz, so maybe that explains it?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
nvidia-bug-report.log.gz (661.3 KB)
P.S. The upload feature completely failed for me on Firefox :-( Had to switch to Chrome to upload!
There’s something really wrong, the gpu is only running xorg and gnome-shell, status is ‘idle’, yet it’s running full throttle with 96% gpu load. Doesn’t make sense. Do you have any kind of gpu monitoring tool running that’s calling nvidia-smi or the like in a fast loop?
simonhf
November 9, 2023, 9:02pm
10
No, nothing like that. The only process I have runnig is htop
, which is how I noticed the irq/106-nvidia
gobbling CPU. What I actually want is to have the system idle without using much CPU :-)
I do have the main laptop display and two external monitors. Could that have anything to do with it?
How to debug this further?
simonhf
November 9, 2023, 9:25pm
11
Hmmm… if I look at it now then it’s in the 0% to 20% range…
$ while true; do nvidia-smi --query-gpu=utilization.gpu --format=csv ; sleep 1; done
utilization.gpu [%]
15 %
utilization.gpu [%]
0 %
utilization.gpu [%]
11 %
utilization.gpu [%]
20 %
utilization.gpu [%]
16 %
utilization.gpu [%]
14 %
utilization.gpu [%]
6 %
utilization.gpu [%]
0 %
utilization.gpu [%]
2 %
utilization.gpu [%]
3 %
utilization.gpu [%]
0 %
utilization.gpu [%]
12 %
utilization.gpu [%]
10 %
utilization.gpu [%]
12 %
utilization.gpu [%]
10 %
utilization.gpu [%]
11 %
utilization.gpu [%]
10 %
utilization.gpu [%]
9 %
^C
generix
November 10, 2023, 8:36am
12
Then the output in the logs might have just been a very odd coincidence.
Regarding monitors, this could very well be, a slightly bad connection might trigger this. Please monitor cpu usage, then disconnect one monitor after another and check if anything changes.
simonhf
November 14, 2023, 5:00pm
13
More bizarre results: So I went away over the weekend and left the laptop idling. Upon return, the irq/106-nvidia
process has decided “on its own” to use less CPU?! But still jumps annoyingly higher from time to time:
$ top -d 1 -b | egrep "irq.*nvidia"
1259 root -51 0 0 0 0 S 0.0 0.0 19:36.55 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.57 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.59 irq/106-nvidia
1259 root -51 0 0 0 0 S 6.8 0.0 19:36.66 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:36.67 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.69 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.71 irq/106-nvidia
1259 root -51 0 0 0 0 R 1.9 0.0 19:36.73 irq/106-nvidia
1259 root -51 0 0 0 0 S 6.9 0.0 19:36.80 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.82 irq/106-nvidia
1259 root -51 0 0 0 0 R 2.0 0.0 19:36.84 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:36.89 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.9 0.0 19:36.92 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.9 0.0 19:36.94 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.9 0.0 19:36.97 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:36.99 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.01 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.03 irq/106-nvidia
1259 root -51 0 0 0 0 S 3.9 0.0 19:37.07 irq/106-nvidia
1259 root -51 0 0 0 0 S 3.9 0.0 19:37.11 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:37.12 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.14 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.9 0.0 19:37.16 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.18 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 19:37.24 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:37.25 irq/106-nvidia
1259 root -51 0 0 0 0 R 2.0 0.0 19:37.27 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.29 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.9 0.0 19:37.31 irq/106-nvidia
1259 root -51 0 0 0 0 S 6.9 0.0 19:37.38 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.40 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.42 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.44 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.9 0.0 19:37.46 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.48 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:37.49 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.51 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.53 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.55 irq/106-nvidia
1259 root -51 0 0 0 0 S 10.7 0.0 19:37.66 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:37.67 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.69 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.71 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.73 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:37.78 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.80 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.82 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.84 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.9 0.0 19:37.87 irq/106-nvidia
1259 root -51 0 0 0 0 S 9.7 0.0 19:37.97 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:37.99 irq/106-nvidia
1259 root -51 0 0 0 0 S 1.0 0.0 19:38.00 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:38.02 irq/106-nvidia
1259 root -51 0 0 0 0 S 2.0 0.0 19:38.04 irq/106-nvidia
1259 root -51 0 0 0 0 R 2.0 0.0 19:38.06 irq/106-nvidia
^C
And this is without disconnecting any monitors, etc.
simonhf
November 14, 2023, 5:06pm
14
Actually… I didn’t tell the entire truth in the last post! I had Firefox running. And when I closed Firefox then mysteriously the irq/106-nvidia
process jumped up to its old, higher CPU level:
$ # firefox NOT running
$ top -d 1 -b | egrep "irq.*nvidia"
1259 root -51 0 0 0 0 S 6.2 0.0 19:47.64 irq/106-nvidia
1259 root -51 0 0 0 0 R 4.9 0.0 19:47.69 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:47.74 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 19:47.80 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:47.85 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:47.90 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:47.95 irq/106-nvidia
1259 root -51 0 0 0 0 R 4.9 0.0 19:48.00 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.05 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.10 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.15 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.20 irq/106-nvidia
1259 root -51 0 0 0 0 S 3.9 0.0 19:48.24 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.29 irq/106-nvidia
1259 root -51 0 0 0 0 R 4.9 0.0 19:48.34 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.39 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 19:48.45 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:48.50 irq/106-nvidia
^C
But if I restart Firefox then there is no mysteriously no CPU change for irq/106-nvidia
:
$ top -d 1 -b | egrep "irq.*nvidia"
1259 root -51 0 0 0 0 S 5.9 0.0 19:55.96 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.01 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.06 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.11 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.16 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.21 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 19:56.27 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.32 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.37 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.42 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.47 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.52 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.57 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.62 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 19:56.67 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.8 0.0 19:56.73 irq/106-nvidia
^C
simonhf
November 14, 2023, 5:09pm
15
I tried unplugging an external monitor, but no difference to irq-106-nvidia
CPU can be seen:
$ top -d 1 -b | egrep "irq.*nvidia"
1259 root -51 0 0 0 0 S 5.9 0.0 20:02.78 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 20:02.84 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 20:02.89 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.8 0.0 20:02.95 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 20:03.01 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 20:03.06 irq/106-nvidia
1259 root -51 0 0 0 0 R 5.9 0.0 20:03.12 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 20:03.18 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 20:03.23 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 20:03.29 irq/106-nvidia
1259 root -51 0 0 0 0 R 4.9 0.0 20:03.34 irq/106-nvidia
1259 root -51 0 0 0 0 S 5.9 0.0 20:03.40 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 20:03.45 irq/106-nvidia
1259 root -51 0 0 0 0 S 4.9 0.0 20:03.50 irq/106-nvidia
^C
And same story if I unplugging the other monitor.
simonhf
November 21, 2023, 6:10pm
16
Anybody have any more ideas on how else to diagnose these issues?
Is there a debug log and how to enable it?