[BUG] lockdep messages when CONFIG_LOCKDEP=y is set

I am building the kernel with CONFIG_LOCKDEP=y and I see below two lockdep messages which I think are bugs. Please let me know how to fix those:

  1. Error in {NVIDIA_L4T_KERNEL}/nvidia/drivers/video/tegra/fb.c
[    2.214561] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[    2.216490] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd

[    2.217986] =============================================
[    2.218019] [ INFO: possible recursive locking detected ]
[    2.218100] 4.9.108 #1 Not tainted
[    2.218127] ---------------------------------------------
[    2.218160] kworker/4:1/59 is trying to acquire lock:
[    2.218388]  ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffff80080d1258>] __blocking_notifier_call_chain+0x38/0x78
[    2.218414] 
               but task is already holding lock:
[    2.218431]  ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffff80080d1258>] __blocking_notifier_call_chain+0x38/0x78
[    2.218435] 
               other info that might help us debug this:
[    2.218440]  Possible unsafe locking scenario:

[    2.218441]        CPU0
[    2.218444]        ----
[    2.218470]   lock((fb_notifier_list).rwsem);
[    2.218475]   lock((fb_notifier_list).rwsem);
[    2.218480] 
                *** DEADLOCK ***

[    2.218482]  May be due to missing lock nesting notation

[    2.218489] 6 locks held by kworker/4:1/59:
[    2.218512]  #0:  ("events"){.+.+.+}, at: [<ffffff80080c768c>] process_one_work+0x1ec/0x750
[    2.218648]  #1:  ((&(&hdmi->hpd_worker)->work)){+.+.+.}, at: [<ffffff80080c768c>] process_one_work+0x1ec/0x750
[    2.218680]  #2:  (&hdmi->hpd_lock){+.+.+.}, at: [<ffffff80086810a0>] tegra_hdmi_hpd_worker+0x50/0x368
[    2.218710]  #3:  (console_lock){+.+.+.}, at: [<ffffff80086e9a0c>] tegra_fb_update_monspecs+0x44/0x268
[    2.218737]  #4:  (&fb_info->lock){+.+.+.}, at: [<ffffff8008537240>] lock_fb_info+0x20/0x50
[    2.218753]  #5:  ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffff80080d1258>] __blocking_notifier_call_chain+0x38/0x78
[    2.218755]
               stack backtrace:
[    2.218786] CPU: 4 PID: 59 Comm: kworker/4:1 Not tainted 4.9.108 #1
[    2.218788] Hardware name: jetson-xavier (DT)
[    2.218827] Workqueue: events tegra_hdmi_hpd_worker
[    2.218832] Call trace:
[    2.218857] [<ffffff800808a6e8>] dump_backtrace+0x0/0x1a8
[    2.218863] [<ffffff800808a8f4>] show_stack+0x14/0x20
[    2.218883] [<ffffff800849897c>] dump_stack+0x9c/0xd0
[    2.218903] [<ffffff8008113740>] validate_chain.isra.22+0xb18/0xc60
[    2.218909] [<ffffff8008114c54>] __lock_acquire+0x374/0x728
[    2.218913] [<ffffff80081155a4>] lock_acquire+0xd4/0x298
[    2.218937] [<ffffff8008e10e74>] down_read+0x3c/0xc8
[    2.219005] [<ffffff80080d1258>] __blocking_notifier_call_chain+0x38/0x78
[    2.219010] [<ffffff80080d12ac>] blocking_notifier_call_chain+0x14/0x20
[    2.219014] [<ffffff8008536e3c>] fb_notifier_call_chain+0x1c/0x28
[    2.219018] [<ffffff80085372e0>] fb_blank+0x50/0xd0
[    2.219035] [<ffffff8008530168>] fbcon_blank+0x278/0x2c0
[    2.219053] [<ffffff8008741a6c>] do_blank_screen+0x184/0x208
[    2.219059] [<ffffff800853228c>] fbcon_event_notify+0x94c/0x958
[    2.219066] [<ffffff80080d0df8>] notifier_call_chain+0x50/0x90
[    2.219070] [<ffffff80080d1270>] __blocking_notifier_call_chain+0x50/0x78
[    2.219075] [<ffffff80080d12ac>] blocking_notifier_call_chain+0x14/0x20
[    2.219079] [<ffffff8008536e3c>] fb_notifier_call_chain+0x1c/0x28
[    2.219083] [<ffffff800853734c>] fb_blank+0xbc/0xd0
[    2.219088] [<ffffff80086e9954>] tegra_fbcon_set_fb_mode+0x3c/0xb0
[    2.219093] [<ffffff80086e9be4>] tegra_fb_update_monspecs+0x21c/0x268
[    2.219144] [<ffffff800867c15c>] tegra_hdmi_hotplug_notify+0xa4/0xb8
[    2.219150] [<ffffff800867f1bc>] tegra_hdmi_edid_eld_setup+0x15c/0x240
[    2.219155] [<ffffff8008681210>] tegra_hdmi_hpd_worker+0x1c0/0x368
[    2.219161] [<ffffff80080c772c>] process_one_work+0x28c/0x750
[    2.219168] [<ffffff80080c7c40>] worker_thread+0x50/0x4c8
[    2.219176] [<ffffff80080cef24>] kthread+0xf4/0x108
[    2.219182] [<ffffff80080830d0>] ret_from_fork+0x10/0x40
[    2.220182] tegradc 15200000.nvdisplay: blank - normal
[    2.262425] tegradc 15200000.nvdisplay: unblank
[    2.267076] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd

I was able to get past this by replacing the function call to fb_blank() with tegra_fb_blank(). But I am unsure if this is the right thing to do. I verified that the monitor over HDMI cable turns on and off as expected, but this test may not be enough.

  1. “BUG:” messages when running nvpmodel_clk_cap_init().
[    8.649787] t19x_cache tegra-cache: probed
[    8.658921] misc nvmap: cvsram :dma coherent mem declare 0x0000000050000000,4194304
[    8.660639] misc nvmap: created heap cvsram base 0x0000000050000000 size (4096KiB)
[    8.668504] tegra_hv: get_hvd: not initialized yet
[    8.672868] user_ivc_mempool: hypervisor not present
[    8.678805] BUG: key ffffffc3e6f97818 not in .data!
[    8.682690] BUG: key ffffffc3e6f97850 not in .data!
[    8.688248] BUG: key ffffffc3e6f97888 not in .data!
[    8.692348] BUG: key ffffffc3e6f978c0 not in .data!
[    8.697595] BUG: key ffffffc3e6f978f8 not in .data!
[    8.702342] BUG: key ffffffc3e6f97930 not in .data!
[    8.707788] BUG: key ffffffc3e6f97968 not in .data!
[    8.712479] BUG: key ffffffc3e6f979a0 not in .data!
[    8.717317] BUG: key ffffffc3e6f979d8 not in .data!
[    8.722167] BUG: key ffffffc3e6f97a10 not in .data!
[    8.727750] BUG: key ffffffc3e6f97a48 not in .data!
[    8.732231] BUG: key ffffffc3e6f97a80 not in .data!
[    8.737120] BUG: key ffffffc3e6f97ab8 not in .data!
[    8.741926] BUG: key ffffffc3e6f97af0 not in .data!
[    8.747629] BUG: key ffffffc3e6f97b28 not in .data!
[    8.751530] nvpmodel: initialized successfully
[    8.756234] GACT probability NOT on

Please let me know how to fix these messages.

Hi,

I have fixed 2nd BUG by adding below line:

Index: b/nvidia/drivers/nvpmodel/nvpmodel_emc_cap.c
===================================================================
--- b.orig/nvidia/drivers/nvpmodel/nvpmodel_emc_cap.c
+++ b/nvidia/drivers/nvpmodel/nvpmodel_emc_cap.c
@@ -212,6 +212,7 @@ static int __init nvpmodel_clk_cap_init(
                        continue;
                }
 
+               sysfs_attr_init(&(clks[i].attr.attr));
                clks[i].attr.attr.mode = 0660;
                clks[i].attr.show = clk_cap_show;
                clks[i].attr.store = clk_cap_store;

This makes lockdep happy when sysfs is dynamically allocated which is being done here with:
185: clks = kzalloc(sizeof(*clks) * num_clocks, GFP_KERNEL);

I am wondering if this should be done for initialization and then creation of nvpmodel_emc_cap directory at line 172 in the same file. But I don’t know if the lockdep “key” cares/knows about directories.

Even though I have found solution for these bugs but I still a confirmation from someone from NVIDIA if these solutions are right. Thanks :)

Thanks for the report.
We have enabled LOCKDEP locally and fixing the bugs reported. will be fixed in next release

Thank you, bbasu.

I think the fix I think fixed the 1st issue in comment #1 may not be the right one. Because after fixing all the lockdep splats (there were 2-3 more after these two), I got another splat which seemed related to the fix I did in that solution. If you find the fix for the 1st issue please share it here so that I can incorporate it in my code.

When will the new jetpack version (software release) be released? We are still working on the developers’ EA version. Thanks in advance.