TX1 crash

Hi everyone,

I use TX1 the BSP version is R28.2 during I2C stress testing, observed below crash, how to fix it? Thank you.

[ 7014.369186] Unable to handle kernel paging request at virtual address dead000000000108
[ 7014.369189] pgd = ffffffc06b3f9000
[ 7014.369193] [dead000000000108] *pgd=0000000161d69003, *pud=0000000161d69003, *pmd=0000000000000000
[ 7014.369196] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 7014.369201] Modules linked in: tn40xx(O) camera_control_ox2_282
[ 7014.369205] CPU: 0 PID: 2049 Comm: czcmti Tainted: G O 4.4.38 #51
[ 7014.369207] Hardware name: jetson_tx1 (DT)
[ 7014.369209] task: ffffffc070d70000 ti: ffffffc0623d4000 task.ti: ffffffc0623d4000
[ 7014.369215] PC is at tegra_dma_tasklet+0x44/0xcc
[ 7014.369216] LR is at tegra_dma_tasklet+0x84/0xcc
[ 7014.369218] pc : [] lr : [] pstate: 200001c5
[ 7014.369219] sp : ffffffc0623d7550
[ 7014.369222] x29: ffffffc0623d7550 x28: ffffffc0623d4000
[ 7014.369224] x27: ffffffc0011d0ab8 x26: ffffffc0011fa000
[ 7014.369226] x25: 0000000000000101 x24: ffffffc0302cd250
[ 7014.369229] x23: ffffffc0302cd180 x22: ffffffc0302cd208
[ 7014.369231] x21: ffffffc0f9cba018 x20: ffffffc000762e14
[ 7014.369233] x19: 0000000000000000 x18: 0000000000000014
[ 7014.369235] x17: 0000007f9e8a5a40 x16: 0000007f9a4b6048
[ 7014.369238] x15: 00000000ffffdc00 x14: 6572646461206d6f
[ 7014.369240] x13: 726620656764656c x12: 776f6e6b6361206f
[ 7014.369242] x11: 6e203a6332692e30 x10: 3031643030303720
[ 7014.369244] x9 : ffffffc000000000 x8 : 000000002d1d6c98
[ 7014.369246] x7 : 0000000000000010 x6 : dead000000000100
[ 7014.369249] x5 : dead000000000200 x4 : 0000000000000100
[ 7014.369251] x3 : 0000000000000000 x2 : ffffffc0620e5870
[ 7014.369253] x1 : 0000000000000140 x0 : 0000000000000140

[ 7014.369256] Call trace:
[ 7014.369259] [] tegra_dma_tasklet+0x44/0xcc
[ 7014.369265] [] tasklet_action+0x8c/0xfc
[ 7014.369267] [] __do_softirq+0x1a0/0x3f4
[ 7014.369270] [] irq_exit+0x74/0xe0
[ 7014.369274] [] __handle_domain_irq+0x90/0xb0
[ 7014.369278] [] gic_handle_irq+0x68/0xbc
[ 7014.369282] [] el1_irq+0x84/0x100
[ 7014.369284] [] vprintk_emit+0x4d4/0x518
[ 7014.369291] [] dev_vprintk_emit+0x1c4/0x1e8
[ 7014.369294] [] dev_printk_emit+0x54/0x5c
[ 7014.369297] [] __dev_printk+0x64/0x80
[ 7014.369299] [] dev_warn+0x64/0x6c
[ 7014.369306] [] tegra_i2c_xfer_msg+0x7f4/0x9e0
[ 7014.369309] [] tegra_i2c_xfer+0x3ac/0x4d8
[ 7014.369315] [] __i2c_transfer+0x308/0x59c
[ 7014.369318] [] i2c_transfer+0x7c/0xcc
[ 7014.369321] [] i2c_master_send+0x40/0x54

Could you have the detail information of the stress test.

[ 7014.369186] Unable to handle kernel paging request at virtual address dead000000000108
[ 7014.369189] pgd = ffffffc06b3f9000
[ 7014.369193] [dead000000000108] *pgd=0000000161d69003, *pud=0000000161d69003, *pmd=0000000000000000
[ 7014.369196] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 7014.369205] CPU: 0 PID: 2049 Comm: \ Tainted: G O 4.4.38 #519
[ 7014.369207] Hardware name: jetson_tx1 (DT)
[ 7014.369209] task: ffffffc070d70000 ti: ffffffc0623d4000 task.ti: ffffffc0623d4000
[ 7014.369215] PC is at tegra_dma_tasklet+0x44/0xcc
[ 7014.369216] LR is at tegra_dma_tasklet+0x84/0xcc
[ 7014.369218] pc : [] lr : [] pstate: 200001c5
[ 7014.369219] sp : ffffffc0623d7550
[ 7014.369222] x29: ffffffc0623d7550 x28: ffffffc0623d4000
[ 7014.369224] x27: ffffffc0011d0ab8 x26: ffffffc0011fa000
[ 7014.369226] x25: 0000000000000101 x24: ffffffc0302cd250
[ 7014.369229] x23: ffffffc0302cd180 x22: ffffffc0302cd208
[ 7014.369231] x21: ffffffc0f9cba018 x20: ffffffc000762e14
[ 7014.369233] x19: 0000000000000000 x18: 0000000000000014
[ 7014.369235] x17: 0000007f9e8a5a40 x16: 0000007f9a4b6048
[ 7014.369238] x15: 00000000ffffdc00 x14: 6572646461206d6f
[ 7014.369240] x13: 726620656764656c x12: 776f6e6b6361206f
[ 7014.369242] x11: 6e203a6332692e30 x10: 3031643030303720
[ 7014.369244] x9 : ffffffc000000000 x8 : 000000002d1d6c98
[ 7014.369246] x7 : 0000000000000010 x6 : dead000000000100
[ 7014.369249] x5 : dead000000000200 x4 : 0000000000000100
[ 7014.369251] x3 : 0000000000000000 x2 : ffffffc0620e5870
[ 7014.369253] x1 : 0000000000000140 x0 : 0000000000000140
[ 7014.369253]
[ 7014.369255] Process i2ctest (pid: 2049, stack limit = 0xffffffc0623d4020)
[ 7014.369256] Call trace:
[ 7014.369259] [] tegra_dma_tasklet+0x44/0xcc
[ 7014.369265] [] tasklet_action+0x8c/0xfc
[ 7014.369267] [] __do_softirq+0x1a0/0x3f4
[ 7014.369270] [] irq_exit+0x74/0xe0
[ 7014.369274] [] __handle_domain_irq+0x90/0xb0
[ 7014.369278] [] gic_handle_irq+0x68/0xbc
[ 7014.369282] [] el1_irq+0x84/0x100
[ 7014.369284] [] vprintk_emit+0x4d4/0x518
[ 7014.369291] [] dev_vprintk_emit+0x1c4/0x1e8
[ 7014.369294] [] dev_printk_emit+0x54/0x5c
[ 7014.369297] [] __dev_printk+0x64/0x80
[ 7014.369299] [] dev_warn+0x64/0x6c
[ 7014.369306] [] tegra_i2c_xfer_msg+0x7f4/0x9e0
[ 7014.369309] [] tegra_i2c_xfer+0x3ac/0x4d8
[ 7014.369315] [] __i2c_transfer+0x308/0x59c
[ 7014.369318] [] i2c_transfer+0x7c/0xcc
[ 7014.369321] [] i2c_master_send+0x40/0x54
[ 7014.369358] [] I2cWriteBlockRegisters+0xf4/0x1a8 [i2c_control]
[ 7014.369376] [] sensorCtlIoctl+0x7ec/0x1120 [i2c_control]
[ 7014.369382] [] do_vfs_ioctl+0x620/0x658
[ 7014.369385] [] SyS_ioctl+0x5c/0x8c
[ 7014.369389] [] el0_svc_naked+0x24/0x28
[ 7014.369661] —[ end trace c08f51d4c5656c4c ]—
[ 7014.371291] Kernel panic - not syncing: Fatal exception in interrupt
[ 7014.371298] CPU1: stopping
[ 7014.371303] CPU: 1 PID: 1827 Comm: compiz Tainted: G D O 4.4.38 #51
[ 7014.371304] Hardware name: jetson_tx1 (DT)
[ 7014.371306] Call trace:
[ 7014.371314] [] dump_backtrace+0x0/0xf4
[ 7014.371318] [] show_stack+0x14/0x1c
[ 7014.371323] [] dump_stack+0xac/0xe4
[ 7014.371326] [] handle_IPI+0x16c/0x328
[ 7014.371329] [] gic_handle_irq+0x94/0xbc
[ 7014.371331] [] el1_irq+0x84/0x100
[ 7014.371334] [] add_wait_queue+0x50/0x60
[ 7014.371339] [] __pollwait+0xec/0xfc
[ 7014.371342] [] eventfd_poll+0x2c/0x60
[ 7014.371345] [] do_sys_poll+0x210/0x45c
[ 7014.371347] [] SyS_ppoll+0x168/0x1bc
[ 7014.371350] [] el0_svc_naked+0x24/0x28
[ 7014.371352] CPU2: stopping
[ 7014.371355] CPU: 2 PID: 1770 Comm: indicator-datet Tainted: G D O 4.4.38 #51
[ 7014.371356] Hardware name: jetson_tx1 (DT)
[ 7014.371357] Call trace:
[ 7014.371361] [] dump_backtrace+0x0/0xf4
[ 7014.371364] [] show_stack+0x14/0x1c
[ 7014.371367] [] dump_stack+0xac/0xe4
[ 7014.371369] [] handle_IPI+0x16c/0x328
[ 7014.371371] [] gic_handle_irq+0x94/0xbc
[ 7014.371374] [] el0_irq_naked+0x20/0x34
[ 7014.371376] CPU3: stopping
[ 7014.371378] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D O 4.4.38 #51
[ 7014.371379] Hardware name: jetson_tx1 (DT)
[ 7014.371380] Call trace:
[ 7014.371384] [] dump_backtrace+0x0/0xf4
[ 7014.371386] [] show_stack+0x14/0x1c
[ 7014.371389] [] dump_stack+0xac/0xe4
[ 7014.371391] [] handle_IPI+0x16c/0x328
[ 7014.371393] [] gic_handle_irq+0x94/0xbc
[ 7014.371394] [] el1_irq+0x84/0x100
[ 7014.371398] [] cpuidle_enter+0x18/0x20
[ 7014.371401] [] call_cpuidle+0x48/0x54
[ 7014.371403] [] cpu_startup_entry+0x2c8/0x394
[ 7014.371405] [] secondary_start_kernel+0x15c/0x168
[ 7014.371406] [<000000008008126c>] 0x8008126c
[ 7014.871527] Rebooting in 5 seconds…[0000.204] [TegraBoot] (version 00.00.2014.50-mobile-6987b70e)

I mean how/what do you do the stress test?

Our platform have a eeprom I write some datas to the eeprom via i2c, I use i2c_master_send() API to write data to eeprom.

Did you try if access any others bus/devices.

I have seen this exact error on another occasion.

Unable to handle kernel paging request at virtual address

and

tegra_dma_tasklet+0x44/0xcc

I can reproduce this consistently when using a custom SPI driver I wrote, which streams the data received by SPI out.

Digging around, it seems to be an issue in the DMA driver, related to this issue https://lore.kernel.org/patchwork/patch/675349/

@ShaneCCC Do you know if this patch was ever merged into L4T Kernels?

Just check this patch not merge yet. But looks like this patch is for the UART DMA issue though.

Thanks ShaneCCC.

The patch seems to fix an issue with the driver calling tegra_dma_terminate_all and cleans up the DMA before tasklet has completed running. And the tasklet holding references to the wrong callback node structure.

I’d think this would apply to all drivers using DMA, not only UART driver.

But it might be harder to reproduce since it seems like an edge case timing bug.

@notthetup
Thanks for your finding.