Hi @DaveYYY
Using NVidia distro, I reverted the commit b3fb2b5173662
enabled more lock debugging flags in the kernel config .config (163.3 KB), rebuilt and flashed the device.
I am back at the boot loop but I can see more information.
There’s are 3 lock errors detected, see kernel_lock_console.txt (69.6 KB).
Ex:
[ 5.254801] BUG: sleeping function called from invalid context at ../kernel/locking/rtmutex.c:987
[ 5.254804] in_atomic(): 1, irqs_disabled(): 128, pid: 5, name: kworker/u12:0
[ 5.254805] INFO: lockdep is turned off.
[ 5.254808] irq event stamp: 0
[ 5.254810] hardirqs last enabled at (0): [< (null)>] (null)
[ 5.254820] hardirqs last disabled at (0): [< (ptrval)>] copy_process.isra.7.part.8+0x2a8/0x18d0
[ 5.254823] softirqs last enabled at (0): [< (ptrval)>] copy_process.isra.7.part.8+0x2a8/0x18d0
[ 5.254825] softirqs last disabled at (0): [< (null)>] (null)
[ 5.254831] Preemption disabled at:
[ 5.254832] [< (ptrval)>] tegra_i2c_xfer_msg+0x1d8/0xc38
[ 5.254838] CPU: 0 PID: 5 Comm: kworker/u12:0 Tainted: G W 4.9.337-rt197-d-tegra-rt #4
[ 5.254840] Hardware name: quill (DT)
[ 5.254852] Workqueue: events_unbound async_run_entry_fn
[ 5.254854] Call trace:
[ 5.254858] [< (ptrval)>] dump_backtrace+0x0/0x1a0
[ 5.254861] [< (ptrval)>] show_stack+0x24/0x30
[ 5.254865] [< (ptrval)>] dump_stack+0xa4/0xd4
[ 5.254870] [< (ptrval)>] ___might_sleep+0x15c/0x230
[ 5.254873] [< (ptrval)>] rt_spin_lock+0x30/0x70
[ 5.254878] [< (ptrval)>] devres_add+0x2c/0x70
[ 5.254881] [< (ptrval)>] devm_kmalloc+0x5c/0x88
[ 5.254886] [< (ptrval)>] tegra_dma_sg_req_get+0xa4/0xc8
[ 5.254889] [< (ptrval)>] tegra_dma_prep_slave_sg+0x190/0x498
[ 5.254892] [< (ptrval)>] tegra_i2c_xfer_msg+0x238/0xc38
[ 5.254894] [< (ptrval)>] tegra_i2c_xfer+0x5e4/0x7a8
[ 5.254897] [< (ptrval)>] __i2c_transfer+0x128/0x9e0
[ 5.254900] [< (ptrval)>] i2c_transfer+0x98/0x108
[ 5.254905] [< (ptrval)>] tegra_hdmi_ddc_i2c_xfer+0x70/0x158
[ 5.254909] [< (ptrval)>] tegra_edid_read_block+0xc0/0x260
[ 5.254913] [< (ptrval)>] tegra_edid_get_monspecs+0x6c/0xbd8
[ 5.254917] [< (ptrval)>] tegra_hdmi_controller_enable+0x2e0/0xf80
[ 5.254920] [< (ptrval)>] tegra_dc_hdmi_enable+0x48/0xb8
[ 5.254924] [< (ptrval)>] tegra_nvdisp_head_enable+0x4a0/0x13b8
[ 5.254927] [< (ptrval)>] _tegra_dc_enable+0xf0/0x110
[ 5.254930] [< (ptrval)>] tegra_dc_probe+0x1204/0x1ac0
[ 5.254933] [< (ptrval)>] platform_drv_probe+0x60/0xc0
[ 5.254936] [< (ptrval)>] driver_probe_device+0x298/0x448
[ 5.254938] [< (ptrval)>] __driver_attach+0x110/0x138
[ 5.254941] [< (ptrval)>] bus_for_each_dev+0x5c/0xa8
[ 5.254944] [< (ptrval)>] driver_attach+0x30/0x40
[ 5.254946] [< (ptrval)>] driver_attach_async+0x20/0x60
[ 5.254949] [< (ptrval)>] async_run_entry_fn+0x48/0x150
[ 5.254953] [< (ptrval)>] process_one_work+0x288/0x7d8
[ 5.254956] [< (ptrval)>] worker_thread+0x50/0x4d0
[ 5.254958] [< (ptrval)>] kthread+0xf4/0xf8
[ 5.254961] [< (ptrval)>] ret_from_fork+0x10/0x50
I found a similar bug report R32.7.1 / 4.9.253-rt168 : BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987
The patch mentioned as solution does not apply and has few errors
../init/do_mounts_rd.c: In function 'rd_load_image':
../init/do_mounts_rd.c:272:3: error: implicit declaration of function 'sys_write'; did you mean 'sys_writev'? [-Werror=implicit-function-declaration]
sys_write(out_fd, buf, BLOCK_SIZE);
^~~~~~~~~
sys_writev
CC arch/arm64/kernel/return_address.o
LD firmware/built-in.o
../init/initramfs.c: In function 'xwrite':
../init/initramfs.c:30:16: error: implicit declaration of function 'sys_write'; did you mean 'sys_writev'? [-Werror=implicit-function-declaration]
ssize_t rv = sys_write(fd, p, count);
^~~~~~~~~
sys_writev
CC arch/arm64/kernel/cpuinfo.o
AS arch/arm64/lib/bitops.o
CC sound/core/sound.o
CC arch/arm64/kernel/cpu_errata.o
CC arch/arm64/kernel/cpufeature.o
CC virt/lib/irqbypass.o
../include/uapi/asm-generic/unistd.h:206:23: error: 'sys_write' undeclared here (not in a function); did you mean 'sys_writev'?
__SYSCALL(__NR_write, sys_write)
^
../arch/arm64/kernel/sys.c:56:35: note: in definition of macro '__SYSCALL'
#define __SYSCALL(nr, sym) [nr] = sym,
Can you reproduce the same on your side?