I just happened to notice a message from a lock-up this morning I hadn’t noticed before:
Jul 29 04:15:23 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000007e0ad439 in irq/164-nvidia:340 has bad 'bp' value 00000000b76406ba
So typically when there is a lock-up I get a lot of log messages about "soft lockup"s. But there appears to only be one message about a “bad” bp value:
~ $ journalctl -r | fgrep 'has bad' Jul 29 04:15:23 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000007e0ad439 in irq/164-nvidia:340 has bad 'bp' value 00000000b76406ba
Jul 14 00:19:56 ed-Precision-7540 kernel: WARNING: kernel stack regs at 0000000088b7896b in irq/164-nvidia:342 has bad 'bp' value 00000000459dd4ee
Jun 29 06:19:48 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000002e76c432 in irq/164-nvidia:339 has bad 'bp' value 0000000053f04dfe
Jun 20 00:39:50 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000a9b0f9c4 in irq/164-nvidia:343 has bad 'bp' value 00000000374a0461
Jun 06 16:16:35 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000cb305bf9 in irq/164-nvidia:349 has bad 'bp' value 00000000ae1976af
May 16 00:06:12 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000c908a7ad in irq/164-nvidia:340 has bad 'bp' value 000000006e105828
May 09 08:01:12 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000219f07ff in irq/164-nvidia:343 has bad 'bp' value 000000001b2eee8c
May 06 20:41:12 ed-Precision-7540 kernel: WARNING: kernel stack regs at 0000000018066900 in irq/164-nvidia:339 has bad 'bp' value 000000008577b7b3
May 04 00:03:25 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000002f9415f1 in irq/164-nvidia:339 has bad 'bp' value 000000003b5ad4c8
Apr 29 00:17:57 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000007bb4ac89 in irq/164-nvidia:402 has bad 'bp' value 0000000045ef49ca
Apr 18 12:45:52 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000007ab77687 in irq/164-nvidia:414 has bad 'bp' value 0000000089c5f384
Apr 18 11:57:39 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000b1a92122 in irq/164-nvidia:339 has bad 'bp' value 000000008293707b
Apr 15 01:36:35 ed-Precision-7540 kernel: WARNING: kernel stack regs at 000000006eef77b2 in irq/164-nvidia:415 has bad 'bp' value 0000000035d4bfd9
Apr 10 11:30:11 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000683ac861 in irq/164-nvidia:414 has bad 'bp' value 00000000d61ee3df
Apr 07 21:22:21 ed-Precision-7540 kernel: WARNING: kernel stack regs at 00000000f0b292fe in irq/164-nvidia:408 has bad 'bp' value 00000000c18959c1
Apr 04 19:30:26 ed-Precision-7540 kernel: WARNING: kernel stack regs at 0000000070631ea2 in irq/165-nvidia:1217 has bad 'bp' value 00000000313edb6b
Mar 30 12:05:13 ed-Precision-7540 kernel: WARNING: kernel stack regs at 0000000085aa01b8 in kworker/15:1:13860 has bad 'bp' value 00000000178392e0
To me this suggests that perhaps the “bad bp” value is responsible for making the system unresponsive (since that only happens once).