On boot of my tx2i, I get the error in the topic.
The dump says this is in:
drivers/mmc/host/cmdq_hci.c row 681
it seems like the kernel failed to load rootfs
hard-reset fixes this
On boot of my tx2i, I get the error in the topic.
You’d need to post a serial console boot log. That actual error message is far too little to debug from. See:
You’d also want to post which L4T release is being used ("
head -n 1 /etc/nv_tegra_release").
I don’t have the entire log, however I do have the kernel panic itself.
I’m using L4T R32.4
Switching to rootfs on /dev/mmcblk0p3 mmc0: RED error 4 !!! sdhci: ========== REGISTER DUMP (mmc0)========== sdhci: Sys addr: 0x00000000 | Version: 0x00000404 sdhci: Blk side: 0x00007200 | Blk cnt: 0x00000000 sdhci: Argument: 0x00001800 | Trn mode: 0x00000033 sdhci: Present: 0x11fb00f0 | Host ctl: 0x0000003d sdhci: Power: 0x00000001 | Blk gap: 0x00000000 sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 sdhci: Int enab: 0xffff4000 | Sig enab: 0xfffc4000 sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000 sdhci: Caps: 0x3f6cd08c | Caps_1: 0x18006f77 sdhci: Host ctl2: 0x0000300d sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000ffee0010 sdhci: ============================================== ------------[ cut here ]---------------- kernel BUG at kernel-source/drivers/mmc/host/cmdq_hci.c:681! Internal error: Oops - BUG: 9 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 not tainted 4.9.140-l4t-r32.4 #1 Hardware name: storm (DT) task: ffffff80a151700 task.stack: ffffff90a140000 PC is at cmdq_irq+0x22c/0x238 LR is at cmdq_irq+0x22c/0x238 pc : [<ffffff8008b9db64>] lr : [<ffffff8008b9db64>] pstate: 604001c5 sp: ffffffc1b676ddb0 x29: ffffffc1b676ddb0 x28: 0000000000004000 x27: ffffff800a377000 x26: 0000000000000000 x25: 0000000000040000 x24: 00000000000001c0 x23: ffffffc1aa855140 x22: 0000000000000000 x21: 0000000000000001 x20: 0000000000000004 x19: ffffffc1aa8550c0 x18: 0000000000000000 x17: 0000000000002697 x16: 0000000000000000 x15: 0000000000000010 x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x14: 3d3d3d3d3d3d3d3d x11: 00000000000002a9 x14: 3d3d3d3d3d3d3d3d x9: 3d3d203a69636864 x8: ffffff80083c8258 x7: ffffff8008a1944d8 x6: ffffff80083c76b8 x5: 0000000000000000 x4: 0000000000000000 x3: ffffffffffffffff x2: ffffff800a164450 x1: ffffff800a151700 x0: 0000000000000032 Process swapper/0 (pid: 0, stack limit = 0xffffff800a140000) Call trace: [ffffff8008b9db64] cmdq_irq+0x22c/0x238 [ffffff8008b964c8] sdhci_irq+0x388/0xd80 [ffffff800811fd18] __handle_irq_event_percpu+0x60/0x280 [ffffff800811ff60] handle_irq_event_percpi+0x28/0x70 [ffffff800811fff8] handle_irq_event+0x50/0x80 [ffffff8008123bcc] habdle_fasteoi_irq+0xc4/0x1a0 [ffffff800811ed4c] generic_handle_irq+0x34/0x50 [ffffff800811f41c] __handle_domain_irq+0x6c/0xc0 [ffffff8008080d2c] gic_handle_irq+0x54/0xa8 [ffffff8008082c28] el1_irq+0xe8/0x194 [ffffff8008b740c8] cpuidle_enter_state+0xb8/0x380 [ffffff8008b74404] cpuidle_enter+0x34/0x48 [ffffff800810febc] call_cpuidle+0x44/0x68 [ffffff80081101fc] cpu_startup_entry+0x18c/0x210 [ffffff8008f1e114] rest_init+0x84/0x90 [ffffff80098e0b38] start_kernel+0x370/0x388 [ffffff80098e0204] __primary_switched+0x80/0x94 ---[ end trace 4c5ca253279e0ecb ]--- Kernel panic - not syncing: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled Memory Limit: none
It looks like the micro-processor managing the eMMC have experienced some sort of error which in turn caused the kernel to load the rootfs properly & resulted in a kernel panic.
the jetson have started a boot loop with the same error each time until a power-on reset, which probably have reset the eMMC micro-processor out of his error state.
I would also like to understanding why the eMMC have got into this faulty state
The above basically says what you said originally, but it names servicing the driver for the eMMC. You do need the full boot log. The problem isn’t that we don’t know what the error is…the problem is instead that we don’t know what leads up to the error. Content prior to the failure is important.
I’ve lost the logs prior to the error… & after the power-on reset I haven’t encountered the error again.
Without knowing more about the error I’ve read about eMMC & found out that cmdq (command queue) is an optimization feature introduced in spec 5.1 of eMMC.
If I had the Technical Reference Manual of the jetson’s internal eMMC device I would have looked into the registers values but I don’t have such datasheet.
Since the error occurs during cmdq_irq I’ve decided to completely disable that feature (CONFIG_MMC_CQ_HCI=n in the kernel menuconfig)
The TRM for the chip itself (the SoC) does not mention external memory, so if it is the used memory you are interested in, then you might be out of luck without getting the data sheet of the memory from the memory vendor. However, you can get the TX2 SoC TRM here:
Someone else would have to tell you where to find the eMMC information, I have no knowledge of this.