kernel dead lock caused by nvgpu

Xavier L4T R32.1, below is kernel log:
[76533.414214] (NULL device *): nvhost_channelctl: invalid cmd 0x80685600
[76540.679206] bpmp: mrq 27 took 1072000 us
[76546.713134] nvgpu: 17000000.gv11b __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ pmu_wait_message_cond+0x68/0x120 [nvgpu]
[76558.760848] INFO: rcu_preempt detected stalls on CPUs/tasks:
[76558.761003] Tasks blocked on level-0 rcu_node (CPUs 0-7): P8721 P8669 P9071
[76558.761132] (detected by 3, t=5252 jiffies, g=6600487, c=6600486, q=3020)
[76558.761254] test_hal R running task 0 8721 8591 0x00000008
[76558.761396] Call trace:
[76558.761458] [] __switch_to+0x94/0xb8
[76558.761547] [] __schedule+0x260/0x770
[76558.761648] [] preempt_schedule_irq+0x40/0x70
[76558.761742] [] el1_preempt+0x8/0x14
[76558.761850] test_log R running task 0 8669 8634 0x00000008
[76558.761970] Call trace:
[76558.762030] [] __switch_to+0x94/0xb8
[76558.762115] [] __schedule+0x260/0x770
[76558.762309] [] preempt_schedule_irq+0x40/0x70
[76558.762753] [] el1_preempt+0x8/0x14
[76558.763163] test_gpu_service R running task 0 9071 8944 0x00000008
[76558.763718] Call trace:
[76558.763907] [] __switch_to+0x94/0xb8
[76558.766695] [] __schedule+0x260/0x770
[76558.771861] [] preempt_schedule_irq+0x40/0x70
[76558.777899] [] el1_preempt+0x8/0x14
[76558.783061] test_hal R running task 0 8721 8591 0x00000008
[76558.790143] Call trace:
[76558.792513] [] __switch_to+0x94/0xb8
[76558.797669] [] __schedule+0x260/0x770
[76558.802834] [] preempt_schedule_irq+0x40/0x70
[76558.808871] [] el1_preempt+0x8/0x14
[76558.814032] test_log R running task 0 8669 8634 0x00000008
[76558.820861] Call trace:
[76558.823744] [] __switch_to+0x94/0xb8
[76558.828649] [] __schedule+0x260/0x770
[76558.833555] [] preempt_schedule_irq+0x40/0x70
[76558.839850] [] el1_preempt+0x8/0x14
[76558.845011] test_gpu_service R running task 0 9071 8944 0x00000008
[76558.852093] Call trace:
[76558.854720] [] __switch_to+0x94/0xb8
[76558.859619] [] __schedule+0x260/0x770
[76558.864783] [] preempt_schedule_irq+0x40/0x70
[76558.870823] [] el1_preempt+0x8/0x14

Is there any other information for this issue?

Are you using devkit?
Is it a pure package from jetpack? or have you modified the kernel?

How to reproduce this issue?

Nope, we are on a self-designed hardware platform, several Xavier inter-connected through a PCIe switch.
We did some modify about the kernel, but touch nothing about nvgpu driver.
Currently, this issue do not reproduce, we are trying to figure out the reproduce condition.
Thanks.

I have the same problem in xavier

Hi collpym,

Please file a new topic for your issue with more details, such as the reproduce steps, BSP info…etc.

Thanks