Hello there.
I’ve noticed the following behavior recently: Whenever I plug back the power cord on my laptop, nvidia-powerd
decides to kill my machine due to an Error, malformed CPU data.
Relevant Setup data:
- ASUS TUF Dash F15 FX517ZR , Nvidia RTX3070 Max-Q
- 12th Gen Intel i7-12650H
- Arch Linux, Kernel
6.6.3-zen1-1-zen
- CPU Hotplug setup with
laptop-mode-tools
. At/etc/laptop-mode/conf.d/cpuhotplug.conf
, when on battery it will send a unplug to cores from 2 to 11, keeping only cpu0 and cpu1(first cpu + HT) and the last 4 cores which are the Economic cores(12,13,14,15) for better battery life. - Optimus-manager so, nvidia is only used when needed.
laptop-mode-tools
will put the desired cores to sleep by issuing echo 0 > /sys/devices/system/cpu/cpuY/online
(replace Y with core index number).
The thing is: The error does not reproduces when I unplug the power cord, and those CPUs are put to sleep, but when I replug the power, nvidia-powerd
crashes hard the machine and it gets a hard reboot making me lose all that was opened.
Relevant logs:
Nov 29 22:03:19 sandworm /usr/bin/nvidia-powerd[31906]: nvidia-powerd version:1.0(build 1)
Nov 29 22:03:20 sandworm /usr/bin/nvidia-powerd[31906]: Error, malformed CPU data.
Nov 29 22:03:20 sandworm nvidia-powerd[31906]: terminate called after throwing an instance of 'std::runtime_error'
Nov 29 22:03:20 sandworm nvidia-powerd[31906]: what(): cpuid_error
Nov 29 22:03:20 sandworm systemd[1]: Started Process Core Dump (PID 31913/UID 0).
░░ Subject: A start job for unit systemd-coredump@1-31913-0.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit systemd-coredump@1-31913-0.service has finished successfully.
░░
░░ The job identifier is 5548.
Nov 29 22:03:21 sandworm systemd-coredump[31914]: [🡕] Process 31906 (nvidia-powerd) of user 0 dumped core.
Module nvidia-powerd without build-id.
Stack trace of thread 31912:
#0 0x00007f38c962783c n/a (libc.so.6 + 0x8e83c)
#1 0x00007f38c95d7668 raise (libc.so.6 + 0x3e668)
#2 0x00007f38c95bf4b8 abort (libc.so.6 + 0x264b8)
#3 0x000000000041c6b5 n/a (nvidia-powerd + 0x1c6b5)
#4 0x000000000041b036 n/a (nvidia-powerd + 0x1b036)
#5 0x000000000041b071 n/a (nvidia-powerd + 0x1b071)
#6 0x000000000041af13 n/a (nvidia-powerd + 0x1af13)
#7 0x000000000040d9ff n/a (nvidia-powerd + 0xd9ff)
#8 0x000000000040dd5f n/a (nvidia-powerd + 0xdd5f)
#9 0x0000000000405322 n/a (nvidia-powerd + 0x5322)
#10 0x00007f38c96259eb n/a (libc.so.6 + 0x8c9eb)
#11 0x00007f38c96a97cc n/a (libc.so.6 + 0x1107cc)
Stack trace of thread 31911:
#0 0x00007f38c98f14c6 n/a (ld-linux-x86-64.so.2 + 0x214c6)
#1 0x00007f38c98d713b n/a (ld-linux-x86-64.so.2 + 0x713b)
#2 0x00007f38c98d86b1 n/a (ld-linux-x86-64.so.2 + 0x86b1)
#3 0x00007f38c98d2715 n/a (ld-linux-x86-64.so.2 + 0x2715)
#4 0x00007f38c98d14e1 _dl_catch_exception (ld-linux-x86-64.so.2 + 0x14e1)
#5 0x00007f38c98d2b75 n/a (ld-linux-x86-64.so.2 + 0x2b75)
#6 0x00007f38c98dc0b1 n/a (ld-linux-x86-64.so.2 + 0xc0b1)
#7 0x00007f38c98d14e1 _dl_catch_exception (ld-linux-x86-64.so.2 + 0x14e1)
#8 0x00007f38c98db81a n/a (ld-linux-x86-64.so.2 + 0xb81a)
#9 0x00007f38c98d14e1 _dl_catch_exception (ld-linux-x86-64.so.2 + 0x14e1)
#10 0x00007f38c98dbbec n/a (ld-linux-x86-64.so.2 + 0xbbec)
#11 0x00007f38c96219ec n/a (libc.so.6 + 0x889ec)
#12 0x00007f38c98d14e1 _dl_catch_exception (ld-linux-x86-64.so.2 + 0x14e1)
#13 0x00007f38c98d1603 n/a (ld-linux-x86-64.so.2 + 0x1603)
#14 0x00007f38c96214f7 n/a (libc.so.6 + 0x884f7)
#15 0x00007f38c9621aa1 dlopen (libc.so.6 + 0x88aa1)
#16 0x0000000000406eb5 n/a (nvidia-powerd + 0x6eb5)
#17 0x0000000000406a64 n/a (nvidia-powerd + 0x6a64)
#18 0x00007f38c96259eb n/a (libc.so.6 + 0x8c9eb)
#19 0x00007f38c96a97cc n/a (libc.so.6 + 0x1107cc)
Stack trace of thread 31906:
#0 0x00007f38c96a53af ioctl (libc.so.6 + 0x10c3af)
#1 0x0000000000410969 n/a (nvidia-powerd + 0x10969)
#2 0x0000000000411a72 n/a (nvidia-powerd + 0x11a72)
#3 0x000000000041296c n/a (nvidia-powerd + 0x1296c)
#4 0x0000000000403cb7 n/a (nvidia-powerd + 0x3cb7)
#5 0x0000000000402eca n/a (nvidia-powerd + 0x2eca)
#6 0x000000000040344c n/a (nvidia-powerd + 0x344c)
#7 0x0000000000402d1a n/a (nvidia-powerd + 0x2d1a)
#8 0x000000000040277b n/a (nvidia-powerd + 0x277b)
#9 0x00007f38c95c0cd0 n/a (libc.so.6 + 0x27cd0)
#10 0x00007f38c95c0d8a __libc_start_main (libc.so.6 + 0x27d8a)
#11 0x0000000000402915 n/a (nvidia-powerd + 0x2915)
ELF object binary architecture: AMD x86-64
░░ Subject: Process 31906 (nvidia-powerd) dumped core
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ Documentation: man:core(5)
lines 1829-1905/1910 100%