Hello All!
I run “nvidia-smi -l 10” and unplugging or plugging in the laptop’s charger
(with a well-charged battery) causes nvidia-smi to exit with the following
error:
Unexpected NVML event
Error occurred while processing the event: Unknown Error
This is on a ThinkPad P16v Gen 2 with an “NVIDIA RTX 3000 Ada Generation
Laptop GPU” graphics chip with ubuntu pre-installed and subsequently
upgraded to 24.04.2. Note, I don’t use the nvidia graphics chip for display
purposes – only for pytorch cuda processing.
Is this a known issue?
Is there a fix for it?
Do I care?
Further details:
This error is reliably reproducible. I saw it happen prior to the os upgrade (and
I believe also with a different nvidia driver), but this is the configuration I made
a point of capturing the error on.
nvidia-smi output:
xxxxx@xxxxxxxxxxx:~$ nvidia-smi -l 10
Sun Mar 16 16:56:02 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 3000 Ada Gene... Off | 00000000:01:00.0 Off | Off |
| N/A 31C P3 364W / 35W | 8MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3188 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
Unexpected NVML event
Error occurred while processing the event: Unknown Error
xxxxx@xxxxxxxxxxx:~$
/var/log/kern.log contains the following entry, coincident in time:
2025-03-16T16:56:10.425125-04:00 xxxxxxxxxxx kernel: workqueue: acpi_os_execute_deferred hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
I haven’t noticed any other unexpected behavior when unplugging or plugging
in the charger.
(Power suspend / resume also causes nvidia-smi to crash, but causes other
errors, as well.)
I captured an instance of nvidia-bug-report.log.gz shortly after running
the above test, but it’s not attached because when I try to upload it,
“Processing: nvidia-bug-report.log.gz…” hangs. Please let me know if you
would like me copy-paste any specific information from the bug report
into this post.
In lieu of the bug report, here is “uname -a”:
Linux xxxxxxxxxxx 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Thanks for any information.
K. Frank