R32.7.1 / 4.9.253-rt168 : INFO: possible circular locking dependency detected (nvpmodel: all_q_mutex + &hp->lock)

SiarheiLiakh · July 20, 2022, 7:24pm

Hello,

I am seeing the following INFO message on boot when running a preemptable kernel with LOCKDEP & etc enabled.
Could you please let me know if this is a real issue and whether there is a fix for it?
Thank you

My configuration:

Jetson AGX Xavier
JetPack 4.6.1
custom built r32.7.1 with rt-patches applied (4.9.253-rt168)
CONFIG_PREEMPT_RT_FULL=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_SPINLOCK=y

[   47.171157] [ INFO: possible circular locking dependency detected ]
[   47.171171] 4.9.253-rt168 #2 Tainted: G        W      
[   47.171178] -------------------------------------------------------
[   47.171185] nvpmodel/3863 is trying to acquire lock:
[   47.171243]  (all_q_mutex){+.+...}, at: [<ffffff80084ac56c>] blk_mq_queue_reinit_work+0x2c/0x128
[   47.171248] 
[   47.171248] but task is already holding lock:
[   47.171268]  (&hp->lock){+.+...}, at: [<ffffff80080b1720>] takedown_cpu+0x98/0x1b0
[   47.171271] 
[   47.171271] which lock already depends on the new lock.
[   47.171271] 
[   47.171286] 
[   47.171286] the existing dependency chain (in reverse order) is:
[   47.171316] 
[   47.171316] -> #3 (&hp->lock){+.+...}:
[   47.171323]        check_prev_add+0x558/0x6f0
[   47.171326]        validate_chain.isra.19+0xa6c/0xc80
[   47.171329]        __lock_acquire+0x2dc/0x758
[   47.171332]        lock_acquire+0xd4/0x2d0
[   47.171337]        rt_spin_lock__no_mg+0x5c/0x68
[   47.171340]        takedown_cpu+0x98/0x1b0
[   47.171343]        cpuhp_invoke_callback+0xe4/0xd10
[   47.171346]        cpuhp_down_callbacks+0x60/0xb0
[   47.171351]        _cpu_down+0x298/0x3b0
[   47.171354]        do_cpu_down+0x88/0x410
[   47.171357]        cpu_down+0x24/0x30
[   47.171362]        cpu_subsys_offline+0x20/0x30
[   47.171367]        device_offline+0x84/0xd8
[   47.171370]        online_store+0x4c/0xa0
[   47.171374]        dev_attr_store+0x44/0x60
[   47.171378]        sysfs_kf_write+0x54/0x78
[   47.171381]        kernfs_fop_write+0xc0/0x1d8
[   47.171389]        __vfs_write+0x48/0x118
[   47.171392]        vfs_write+0xac/0x1b0
[   47.171395]        SyS_write+0x5c/0xc8
[   47.171400]        el0_svc_naked+0x34/0x38
[   47.171405] 
[   47.171405] -> #2 (cpu_hotplug.lock){+.+.+.}:
[   47.171408]        check_prev_add+0x558/0x6f0
[   47.171412]        validate_chain.isra.19+0xa6c/0xc80
[   47.171414]        __lock_acquire+0x2dc/0x758
[   47.171417]        lock_acquire+0xd4/0x2d0
[   47.171420]        _mutex_lock+0x3c/0x50
[   47.171423]        get_online_cpus+0x78/0xa0
[   47.171426]        __cpuhp_state_add_instance+0x64/0x1b0
[   47.171430]        blk_mq_realloc_hw_ctxs+0x214/0x420
[   47.171433]        blk_mq_init_allocated_queue+0x7c/0x458
[   47.171436]        blk_mq_init_queue+0x3c/0x78
[   47.171440]        loop_add+0xf8/0x2c0
[   47.171446]        loop_init+0x104/0x14c
[   47.171449]        do_one_initcall+0x44/0x130
[   47.171454]        kernel_init_freeable+0x224/0x2c8
[   47.171457]        kernel_init+0x18/0x108
[   47.171460]        ret_from_fork+0x10/0x50
[   47.171464] 
[   47.171464] -> #1 (&q->sysfs_lock){+.+.+.}:
[   47.171468]        check_prev_add+0x558/0x6f0
[   47.171472]        validate_chain.isra.19+0xa6c/0xc80
[   47.171475]        __lock_acquire+0x2dc/0x758
[   47.171478]        lock_acquire+0xd4/0x2d0
[   47.171481]        _mutex_lock+0x3c/0x50
[   47.171484]        blk_mq_map_swqueue+0x3c/0x208
[   47.171487]        blk_mq_init_allocated_queue+0x3f0/0x458
[   47.171489]        blk_mq_init_queue+0x3c/0x78
[   47.171492]        loop_add+0xf8/0x2c0
[   47.171495]        loop_init+0x104/0x14c
[   47.171498]        do_one_initcall+0x44/0x130
[   47.171501]        kernel_init_freeable+0x224/0x2c8
[   47.171504]        kernel_init+0x18/0x108
[   47.171507]        ret_from_fork+0x10/0x50
[   47.171512] 
[   47.171512] -> #0 (all_q_mutex){+.+...}:
[   47.171521]        print_circular_bug+0x78/0x2d8
[   47.171524]        check_prev_add+0xf4/0x6f0
[   47.171527]        validate_chain.isra.19+0xa6c/0xc80
[   47.171530]        __lock_acquire+0x2dc/0x758
[   47.171534]        lock_acquire+0xd4/0x2d0
[   47.171537]        _mutex_lock+0x3c/0x50
[   47.171540]        blk_mq_queue_reinit_work+0x2c/0x128
[   47.171543]        blk_mq_queue_reinit_dead+0x24/0x30
[   47.171546]        cpuhp_invoke_callback+0xe4/0xd10
[   47.171549]        cpuhp_down_callbacks+0x60/0xb0
[   47.171552]        _cpu_down+0x298/0x3b0
[   47.171555]        do_cpu_down+0x88/0x410
[   47.171557]        cpu_down+0x24/0x30
[   47.171561]        cpu_subsys_offline+0x20/0x30
[   47.171564]        device_offline+0x84/0xd8
[   47.171568]        online_store+0x4c/0xa0
[   47.171571]        dev_attr_store+0x44/0x60
[   47.171578]        sysfs_kf_write+0x54/0x78
[   47.171580]        kernfs_fop_write+0xc0/0x1d8
[   47.171584]        __vfs_write+0x48/0x118
[   47.171587]        vfs_write+0xac/0x1b0
[   47.171590]        SyS_write+0x5c/0xc8
[   47.171593]        el0_svc_naked+0x34/0x38
[   47.171594] 
[   47.171594] other info that might help us debug this:
[   47.171594] 
[   47.171600] Chain exists of:
[   47.171600]   all_q_mutex --> cpu_hotplug.lock --> &hp->lock
[   47.171600] 
[   47.171601]  Possible unsafe locking scenario:
[   47.171601] 
[   47.171602]        CPU0                    CPU1
[   47.171603]        ----                    ----
[   47.171607]   lock(&hp->lock);
[   47.171609]                                lock(cpu_hotplug.lock);
[   47.171612]                                lock(&hp->lock);
[   47.171615]   lock(all_q_mutex);
[   47.171616] 
[   47.171616]  *** DEADLOCK ***
[   47.171616] 
[   47.171618] 9 locks held by nvpmodel/3863:
[   47.171626]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffff800829e024>] vfs_write+0x19c/0x1b0
[   47.171633]  #1:  (&of->mutex){+.+.+.}, at: [<ffffff8008347350>] kernfs_fop_write+0x90/0x1d8
[   47.171641]  #2:  (s_active#213){.+.+.+}, at: [<ffffff8008345b7c>] kernfs_get_active+0x24/0x40
[   47.171648]  #3:  (device_hotplug_lock){+.+.+.}, at: [<ffffff8008885100>] lock_device_hotplug_sysfs+0x20/0x50
[   47.171655]  #4:  (&dev->mutex){......}, at: [<ffffff800888657c>] device_offline+0x5c/0xd8
[   47.171662]  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffff80080b4874>] do_cpu_down+0x6c/0x410
[   47.171669]  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffff80080b46e8>] cpu_hotplug_begin+0x0/0xc0
[   47.171675]  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffff80080b4760>] cpu_hotplug_begin+0x78/0xc0
[   47.171683]  #8:  (&hp->lock){+.+...}, at: [<ffffff80080b1720>] takedown_cpu+0x98/0x1b0
[   47.171684] 
[   47.171684] stack backtrace:
[   47.171692] CPU: 0 PID: 3863 Comm: nvpmodel Tainted: G        W       4.9.253-rt168 #2
[   47.171693] Hardware name: Jetson-AGX (DT)
[   47.171695] Call trace:
[   47.171700] [<ffffff800808c2b8>] dump_backtrace+0x0/0x1a8
[   47.171704] [<ffffff800808c964>] show_stack+0x24/0x30
[   47.171707] [<ffffff80090ea3c0>] dump_stack+0xa0/0xd0
[   47.171710] [<ffffff8008129dac>] print_circular_bug+0x1cc/0x2d8
[   47.171713] [<ffffff800812ad7c>] check_prev_add+0xf4/0x6f0
[   47.171717] [<ffffff800812bde4>] validate_chain.isra.19+0xa6c/0xc80
[   47.171720] [<ffffff800812d044>] __lock_acquire+0x2dc/0x758
[   47.171723] [<ffffff800812d9bc>] lock_acquire+0xd4/0x2d0
[   47.171726] [<ffffff80090f6a14>] _mutex_lock+0x3c/0x50
[   47.171729] [<ffffff80084ac56c>] blk_mq_queue_reinit_work+0x2c/0x128
[   47.171732] [<ffffff80084ac68c>] blk_mq_queue_reinit_dead+0x24/0x30
[   47.171735] [<ffffff80080b1d2c>] cpuhp_invoke_callback+0xe4/0xd10
[   47.171738] [<ffffff80080b3408>] cpuhp_down_callbacks+0x60/0xb0
[   47.171741] [<ffffff80090eda70>] _cpu_down+0x298/0x3b0
[   47.171744] [<ffffff80080b4890>] do_cpu_down+0x88/0x410
[   47.171747] [<ffffff80080b4c3c>] cpu_down+0x24/0x30
[   47.171750] [<ffffff800888d488>] cpu_subsys_offline+0x20/0x30
[   47.171753] [<ffffff80088865a4>] device_offline+0x84/0xd8
[   47.171757] [<ffffff80088866fc>] online_store+0x4c/0xa0
[   47.171760] [<ffffff8008883254>] dev_attr_store+0x44/0x60
[   47.171763] [<ffffff8008348404>] sysfs_kf_write+0x54/0x78
[   47.171765] [<ffffff8008347380>] kernfs_fop_write+0xc0/0x1d8
[   47.171768] [<ffffff800829cf40>] __vfs_write+0x48/0x118
[   47.171771] [<ffffff800829df34>] vfs_write+0xac/0x1b0
[   47.171774] [<ffffff800829f5bc>] SyS_write+0x5c/0xc8
[   47.171777] [<ffffff8008083140>] el0_svc_naked+0x34/0x38

kayccc · July 27, 2022, 5:04am

Sorry for the late response, our team will do the investigation and provide suggestions soon. Thanks

sumitg · July 29, 2022, 9:39am

Could you try after disabling the nvpmodel service and share if getting this issue. One way to disable it is by renaming the bin file and then reboot.
/home/ubuntu# mv /usr/sbin/nvpmodel /usr/sbin/nvpmodel_bkp

SiarheiLiakh · August 2, 2022, 3:48pm

after renaming nvpmodel binary, the original message no longer show up (note that this does not resolve a potential deadlock in the kernel though).

However, another one took its place:

Looks like it just alternates between these two issues depending on which one shows up first.
Please see the logs and the full config attached.

Thank you.

20220802-JP461-bootlog.tgz (59.4 KB)
config-4.9.253-rt168.gz (37.1 KB)

sumitg · August 3, 2022, 3:15am

nvhost recursive locking is a different issue and might be false positive. Could you share if you get the original issue reported in this thread after disabling nvpmodel.

SiarheiLiakh · August 3, 2022, 11:49am

As stated in my previous post above:

after renaming nvpmodel binary, the original message no longer show up (note that this does not resolve a potential deadlock in the kernel though).

sumitg · August 5, 2022, 11:49am

CPU hotplug in 4.9-rt is not stable and can result in deadlocks.
‘nvpmodel’ does hotplug power off for some of the cores at end of booting where the possible lockup seems to be reporting.

Please refer: Re: [v4.9-rt][report] stress-ng: possible circular locking dependency detected (Linux Real Time)
Some of possible deadlocks already reported: [v4.9-rt][report] stress-ng: possible circular locking dependency detected (Linux Real Time)

SiarheiLiakh · August 5, 2022, 2:14pm

Would it be best to disable CPU hotplug in .config and build the kernel without it?

Thank you.

sumitg · August 5, 2022, 3:26pm

Not sure if we should disable it in the default defconfig for Jetson. Will check about it.
Could you disable ‘CONFIG_HOTPLUG_CPU’ and hopefully the issue shouldn’t come anymore.

SiarheiLiakh · August 5, 2022, 4:04pm

Turns out PM_SLEEP_SMP selects HOTPLUG_CPU=y so we’d have to disable more than just HOTPLUG_CPU, which could have issues of its own in our specific case.

Thank you.

system · September 7, 2022, 1:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
R32.7.1 / 4.9.253-rt168 : INFO: possible recursive locking detected [ (&pdata->busy_lock) in nvhost_module_busy ] Jetson AGX Xavier kernel , preempt_rt	3	829	August 2, 2022
[BUG] lockdep messages when CONFIG_LOCKDEP=y is set Jetson AGX Xavier	4	1141	October 18, 2021
Lockdep - possible recursive locking - nvhost_module_busy Jetson Nano kernel	2	971	March 24, 2021
JetPack 4.6.3 preempt-rt patkernel: reboot loop Jetson TX2 boot , preempt_rt	34	2226	March 27, 2024
Lockdep messages on L4T kernel Jetson TX2	3	582	October 18, 2021
kernel dead lock caused by nvgpu Jetson AGX Xavier	5	1167	October 18, 2021
Can't turn off both Denver cores when using real-time kernel Jetson AGX Xavier kernel , nvbugs	15	1030	October 18, 2021
Segfault running 'jetson_clocks --fan' on a kernel built with CONFIG_DEBUG_MUTEXES Jetson AGX Xavier kernel , nvbugs , driver	7	689	October 18, 2021
Lockdep: registering non-static key message displayed in nvhost_module_runtime_suspend Jetson Xavier NX kernel	8	1058	October 18, 2021
RT patches leading to warning and ultimately deadlocked system Jetson TX2 kernel	12	1168	October 18, 2021

R32.7.1 / 4.9.253-rt168 : INFO: possible circular locking dependency detected (nvpmodel: all_q_mutex + &hp->lock)

Related topics