Hello,
I am seeing the following INFO message on boot when running a preemptable kernel with LOCKDEP & etc enabled.
Could you please let me know if this is a real issue and whether there is a fix for it?
Thank you
My configuration:
- Jetson AGX Xavier
- JetPack 4.6.1
- custom built r32.7.1 with rt-patches applied (4.9.253-rt168)
- CONFIG_PREEMPT_RT_FULL=y
- CONFIG_DEBUG_KERNEL=y
- CONFIG_LOCKDEP=y
- CONFIG_PROVE_LOCKING=y
- CONFIG_DEBUG_SPINLOCK=y
[ 47.171157] [ INFO: possible circular locking dependency detected ]
[ 47.171171] 4.9.253-rt168 #2 Tainted: G W
[ 47.171178] -------------------------------------------------------
[ 47.171185] nvpmodel/3863 is trying to acquire lock:
[ 47.171243] (all_q_mutex){+.+...}, at: [<ffffff80084ac56c>] blk_mq_queue_reinit_work+0x2c/0x128
[ 47.171248]
[ 47.171248] but task is already holding lock:
[ 47.171268] (&hp->lock){+.+...}, at: [<ffffff80080b1720>] takedown_cpu+0x98/0x1b0
[ 47.171271]
[ 47.171271] which lock already depends on the new lock.
[ 47.171271]
[ 47.171286]
[ 47.171286] the existing dependency chain (in reverse order) is:
[ 47.171316]
[ 47.171316] -> #3 (&hp->lock){+.+...}:
[ 47.171323] check_prev_add+0x558/0x6f0
[ 47.171326] validate_chain.isra.19+0xa6c/0xc80
[ 47.171329] __lock_acquire+0x2dc/0x758
[ 47.171332] lock_acquire+0xd4/0x2d0
[ 47.171337] rt_spin_lock__no_mg+0x5c/0x68
[ 47.171340] takedown_cpu+0x98/0x1b0
[ 47.171343] cpuhp_invoke_callback+0xe4/0xd10
[ 47.171346] cpuhp_down_callbacks+0x60/0xb0
[ 47.171351] _cpu_down+0x298/0x3b0
[ 47.171354] do_cpu_down+0x88/0x410
[ 47.171357] cpu_down+0x24/0x30
[ 47.171362] cpu_subsys_offline+0x20/0x30
[ 47.171367] device_offline+0x84/0xd8
[ 47.171370] online_store+0x4c/0xa0
[ 47.171374] dev_attr_store+0x44/0x60
[ 47.171378] sysfs_kf_write+0x54/0x78
[ 47.171381] kernfs_fop_write+0xc0/0x1d8
[ 47.171389] __vfs_write+0x48/0x118
[ 47.171392] vfs_write+0xac/0x1b0
[ 47.171395] SyS_write+0x5c/0xc8
[ 47.171400] el0_svc_naked+0x34/0x38
[ 47.171405]
[ 47.171405] -> #2 (cpu_hotplug.lock){+.+.+.}:
[ 47.171408] check_prev_add+0x558/0x6f0
[ 47.171412] validate_chain.isra.19+0xa6c/0xc80
[ 47.171414] __lock_acquire+0x2dc/0x758
[ 47.171417] lock_acquire+0xd4/0x2d0
[ 47.171420] _mutex_lock+0x3c/0x50
[ 47.171423] get_online_cpus+0x78/0xa0
[ 47.171426] __cpuhp_state_add_instance+0x64/0x1b0
[ 47.171430] blk_mq_realloc_hw_ctxs+0x214/0x420
[ 47.171433] blk_mq_init_allocated_queue+0x7c/0x458
[ 47.171436] blk_mq_init_queue+0x3c/0x78
[ 47.171440] loop_add+0xf8/0x2c0
[ 47.171446] loop_init+0x104/0x14c
[ 47.171449] do_one_initcall+0x44/0x130
[ 47.171454] kernel_init_freeable+0x224/0x2c8
[ 47.171457] kernel_init+0x18/0x108
[ 47.171460] ret_from_fork+0x10/0x50
[ 47.171464]
[ 47.171464] -> #1 (&q->sysfs_lock){+.+.+.}:
[ 47.171468] check_prev_add+0x558/0x6f0
[ 47.171472] validate_chain.isra.19+0xa6c/0xc80
[ 47.171475] __lock_acquire+0x2dc/0x758
[ 47.171478] lock_acquire+0xd4/0x2d0
[ 47.171481] _mutex_lock+0x3c/0x50
[ 47.171484] blk_mq_map_swqueue+0x3c/0x208
[ 47.171487] blk_mq_init_allocated_queue+0x3f0/0x458
[ 47.171489] blk_mq_init_queue+0x3c/0x78
[ 47.171492] loop_add+0xf8/0x2c0
[ 47.171495] loop_init+0x104/0x14c
[ 47.171498] do_one_initcall+0x44/0x130
[ 47.171501] kernel_init_freeable+0x224/0x2c8
[ 47.171504] kernel_init+0x18/0x108
[ 47.171507] ret_from_fork+0x10/0x50
[ 47.171512]
[ 47.171512] -> #0 (all_q_mutex){+.+...}:
[ 47.171521] print_circular_bug+0x78/0x2d8
[ 47.171524] check_prev_add+0xf4/0x6f0
[ 47.171527] validate_chain.isra.19+0xa6c/0xc80
[ 47.171530] __lock_acquire+0x2dc/0x758
[ 47.171534] lock_acquire+0xd4/0x2d0
[ 47.171537] _mutex_lock+0x3c/0x50
[ 47.171540] blk_mq_queue_reinit_work+0x2c/0x128
[ 47.171543] blk_mq_queue_reinit_dead+0x24/0x30
[ 47.171546] cpuhp_invoke_callback+0xe4/0xd10
[ 47.171549] cpuhp_down_callbacks+0x60/0xb0
[ 47.171552] _cpu_down+0x298/0x3b0
[ 47.171555] do_cpu_down+0x88/0x410
[ 47.171557] cpu_down+0x24/0x30
[ 47.171561] cpu_subsys_offline+0x20/0x30
[ 47.171564] device_offline+0x84/0xd8
[ 47.171568] online_store+0x4c/0xa0
[ 47.171571] dev_attr_store+0x44/0x60
[ 47.171578] sysfs_kf_write+0x54/0x78
[ 47.171580] kernfs_fop_write+0xc0/0x1d8
[ 47.171584] __vfs_write+0x48/0x118
[ 47.171587] vfs_write+0xac/0x1b0
[ 47.171590] SyS_write+0x5c/0xc8
[ 47.171593] el0_svc_naked+0x34/0x38
[ 47.171594]
[ 47.171594] other info that might help us debug this:
[ 47.171594]
[ 47.171600] Chain exists of:
[ 47.171600] all_q_mutex --> cpu_hotplug.lock --> &hp->lock
[ 47.171600]
[ 47.171601] Possible unsafe locking scenario:
[ 47.171601]
[ 47.171602] CPU0 CPU1
[ 47.171603] ---- ----
[ 47.171607] lock(&hp->lock);
[ 47.171609] lock(cpu_hotplug.lock);
[ 47.171612] lock(&hp->lock);
[ 47.171615] lock(all_q_mutex);
[ 47.171616]
[ 47.171616] *** DEADLOCK ***
[ 47.171616]
[ 47.171618] 9 locks held by nvpmodel/3863:
[ 47.171626] #0: (sb_writers#6){.+.+.+}, at: [<ffffff800829e024>] vfs_write+0x19c/0x1b0
[ 47.171633] #1: (&of->mutex){+.+.+.}, at: [<ffffff8008347350>] kernfs_fop_write+0x90/0x1d8
[ 47.171641] #2: (s_active#213){.+.+.+}, at: [<ffffff8008345b7c>] kernfs_get_active+0x24/0x40
[ 47.171648] #3: (device_hotplug_lock){+.+.+.}, at: [<ffffff8008885100>] lock_device_hotplug_sysfs+0x20/0x50
[ 47.171655] #4: (&dev->mutex){......}, at: [<ffffff800888657c>] device_offline+0x5c/0xd8
[ 47.171662] #5: (cpu_add_remove_lock){+.+.+.}, at: [<ffffff80080b4874>] do_cpu_down+0x6c/0x410
[ 47.171669] #6: (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080b46e8>] cpu_hotplug_begin+0x0/0xc0
[ 47.171675] #7: (cpu_hotplug.lock){+.+.+.}, at: [<ffffff80080b4760>] cpu_hotplug_begin+0x78/0xc0
[ 47.171683] #8: (&hp->lock){+.+...}, at: [<ffffff80080b1720>] takedown_cpu+0x98/0x1b0
[ 47.171684]
[ 47.171684] stack backtrace:
[ 47.171692] CPU: 0 PID: 3863 Comm: nvpmodel Tainted: G W 4.9.253-rt168 #2
[ 47.171693] Hardware name: Jetson-AGX (DT)
[ 47.171695] Call trace:
[ 47.171700] [<ffffff800808c2b8>] dump_backtrace+0x0/0x1a8
[ 47.171704] [<ffffff800808c964>] show_stack+0x24/0x30
[ 47.171707] [<ffffff80090ea3c0>] dump_stack+0xa0/0xd0
[ 47.171710] [<ffffff8008129dac>] print_circular_bug+0x1cc/0x2d8
[ 47.171713] [<ffffff800812ad7c>] check_prev_add+0xf4/0x6f0
[ 47.171717] [<ffffff800812bde4>] validate_chain.isra.19+0xa6c/0xc80
[ 47.171720] [<ffffff800812d044>] __lock_acquire+0x2dc/0x758
[ 47.171723] [<ffffff800812d9bc>] lock_acquire+0xd4/0x2d0
[ 47.171726] [<ffffff80090f6a14>] _mutex_lock+0x3c/0x50
[ 47.171729] [<ffffff80084ac56c>] blk_mq_queue_reinit_work+0x2c/0x128
[ 47.171732] [<ffffff80084ac68c>] blk_mq_queue_reinit_dead+0x24/0x30
[ 47.171735] [<ffffff80080b1d2c>] cpuhp_invoke_callback+0xe4/0xd10
[ 47.171738] [<ffffff80080b3408>] cpuhp_down_callbacks+0x60/0xb0
[ 47.171741] [<ffffff80090eda70>] _cpu_down+0x298/0x3b0
[ 47.171744] [<ffffff80080b4890>] do_cpu_down+0x88/0x410
[ 47.171747] [<ffffff80080b4c3c>] cpu_down+0x24/0x30
[ 47.171750] [<ffffff800888d488>] cpu_subsys_offline+0x20/0x30
[ 47.171753] [<ffffff80088865a4>] device_offline+0x84/0xd8
[ 47.171757] [<ffffff80088866fc>] online_store+0x4c/0xa0
[ 47.171760] [<ffffff8008883254>] dev_attr_store+0x44/0x60
[ 47.171763] [<ffffff8008348404>] sysfs_kf_write+0x54/0x78
[ 47.171765] [<ffffff8008347380>] kernfs_fop_write+0xc0/0x1d8
[ 47.171768] [<ffffff800829cf40>] __vfs_write+0x48/0x118
[ 47.171771] [<ffffff800829df34>] vfs_write+0xac/0x1b0
[ 47.171774] [<ffffff800829f5bc>] SyS_write+0x5c/0xc8
[ 47.171777] [<ffffff8008083140>] el0_svc_naked+0x34/0x38