Lockdep: registering non-static key message displayed in nvhost_module_runtime_suspend

Hello,

While running a “debug” kernel, we get below INFO message from LOCKDEP. A “debug” kernel has some kernel’s debug features turned on; e.g. CONFIG_DEBUG_KERNEL, CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP, CONFIG_DEBUG_LOCK_ALLOC, etc.

[   19.135692] INFO: trying to register non-static key.
[   19.135705] the code is fine but needs lockdep annotation.
[   19.135715] turning off the locking correctness validator.
[   19.135734] CPU: 0 PID: 971 Comm: kworker/0:1 Tainted: G        W       4.9.140-rt94-r32.4.3-tegra-RedHawk-7.5.3-r600-nvhost-acm-non-sta #10
[   19.135744] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[   19.135776] Workqueue: pm pm_runtime_work
[   19.135790] Call trace:
[   19.135826] [<ffffff800808bfd0>] dump_backtrace+0x0/0x1b0
[   19.135865] [<ffffff800808c2bc>] show_stack+0x24/0x30
[   19.135890] [<ffffff80084bffc0>] dump_stack+0x88/0xb0
[   19.135901] [<ffffff8008147474>] register_lock_class+0x394/0x5f0
[   19.135911] [<ffffff80081486ac>] __lock_acquire.isra.12+0x6c/0xa10
[   19.135920] [<ffffff800814949c>] lock_acquire+0xf4/0x238
[   19.135930] [<ffffff80080fac44>] flush_work+0x44/0x2a0
[   19.135939] [<ffffff80080fc720>] __cancel_work_timer+0xa0/0x188
[   19.135948] [<ffffff80080fc85c>] cancel_delayed_work_sync+0x24/0x30
[   19.135960] [<ffffff8008e67700>] devfreq_monitor_suspend+0x50/0x80
[   19.135969] [<ffffff8008e6cf88>] devfreq_watermark_event_handler+0x408/0x4b0
[   19.135978] [<ffffff8008e6767c>] devfreq_suspend_device+0x64/0x98
[   19.135990] [<ffffff80085958d4>] nvhost_module_runtime_suspend+0x10c/0x1a8
[   19.136000] [<ffffff8008921e14>] pm_generic_runtime_suspend+0x3c/0x58
[   19.136010] [<ffffff8008934000>] genpd_runtime_suspend+0x98/0x238
[   19.136023] [<ffffff8008924454>] __rpm_callback+0x74/0xa0
[   19.136027] [<ffffff80089244b4>] rpm_callback+0x34/0x98
[   19.136032] [<ffffff8008924b20>] rpm_suspend+0x100/0x608
[   19.136037] [<ffffff8008926730>] pm_runtime_work+0x80/0xb8
[   19.136042] [<ffffff80080fb56c>] process_one_work+0x2cc/0x710
[   19.136047] [<ffffff80080fba0c>] worker_thread+0x5c/0x480
[   19.136052] [<ffffff8008102ca4>] kthread+0xf4/0xf8
[   19.136057] [<ffffff8008083320>] ret_from_fork+0x10/0x30

I am still debugging this, but initial observation is that the devfreq_suspend_device() in nvhost_module_prepare_poweroff() tries to flush_work() which is not present, OR is not initialized with INIT_DELAYED_WORK().

I also noticed that the devfreq_resume_device() in the file

nvidia/drivers/video/tegra/host/nvhost_acm.c, calls the devfreq_watermark_event_handler(). This, in turn calls devfreq_monitor_resume() which queues a delayed work, through queue_delayed_work(). This may be enough to make LOCKDEP happy as this queuing of work may, at some later point initialize a key. Or I may be wrong here and there needs to be an explicit call to INIT_DELAYED_WORK.

I am not familiar with this device driver and the code path, and hence, don’t know how to fix/proceed. Please let me know the same.

Please note that this message is displayed on other Jetson devices too. I have seen it in R32.3.1 and R32.4.3. It may be present in previous releases but it is untested.

Hello @WayneWWW or anyone else from NVIDIA, any update on this?

How do you enable the debug feature?

Hello ShaneCCC,

The debug features need to be turned on in the kernel.

Please enabled below CONFIG_ options:

CONFIG_DEBUG_KERNEL
CONFIG_LOCKDEP
CONFIG_PROVE_LOCKING
CONFIG_DEBUG_LOCK_ALLOC
CONFIG_DEBUG_MUTEXES
CONFIG_DEBUG_SPINLOCKS
CONFIG_DEBUG_LOCKDEP
CONFIG_UNINLINE_SPIN_UNLOCK

I think this should enable pretty good debug features in the kernel. CONFIG_LOCKDEP provides lockdep messages whenever it detects the first. Generally, RedHawk debug kernel has much more debug options enabled but I think above list is sufficient to provide most of the warnings.

BTW, once these CONFIG_ options are set in the kernel config, please run make silentoldconfig as this may provide some more questions and/or warnings. Later, compile and boot the kernel.

Please note that Lockdep checking is turned off when it first detects one. So, you may not see the above warning the first time (unless you are lucky). You may have to resolve each of the previous warnings before you see the above one.

I have tried to provide as much information as I can in my original post, so that the developer(s) of this device/subsystem will have the necessary info.

These debug macros not enable so that boot an system is faster. enabling all debug macros will impact perf.

To enable it is not supported.

Hello kayccc,

I know that debugging CONFIGs should not be enabled in production environment. We don’t do it ourselves, however, it is useful when developing device drivers or debugging a kernel bug, and hence we perform the testing of a “debug” kernel.

Having said that, the post was created to let you know about the lockdep warnings occurring in one of the NVIDIA’s device driver code because it may or may not have locking discrepancies. Sometimes, these add to the increasing real-time latency (on production kernels) so we try to iron out as many of these warnings as we can. Also, during the device driver development, the lockdep mechanism should not be off (as it does the first instance it detects a locking discrepancy) for it to detect issues only in the driver code that’s being developed.

Yes, we plan to fix this bug in upcoming release. thanks for point it out.

1 Like

Thank you :-)