Kernel panic on boot - mmc0: RED error 4!

On boot of my tx2i, I get the error in the topic.
The dump says this is in:
drivers/mmc/host/cmdq_hci.c row 681
it seems like the kernel failed to load rootfs
any help?
hard-reset fixes this

You’d need to post a serial console boot log. That actual error message is far too little to debug from. See:
http://www.jetsonhacks.com/2017/03/24/serial-console-nvidia-jetson-tx2/

You’d also want to post which L4T release is being used ("head -n 1 /etc/nv_tegra_release").

I don’t have the entire log, however I do have the kernel panic itself.
I’m using L4T R32.4

Switching to rootfs on /dev/mmcblk0p3
mmc0: RED error 4 !!!
sdhci: ========== REGISTER DUMP (mmc0)==========
sdhci: Sys addr:    0x00000000 | Version:  0x00000404
sdhci: Blk side:    0x00007200 | Blk cnt:  0x00000000
sdhci: Argument:    0x00001800 | Trn mode: 0x00000033
sdhci: Present:     0x11fb00f0 | Host ctl: 0x0000003d
sdhci: Power:       0x00000001 | Blk gap:  0x00000000
sdhci: Wake-up:     0x00000000 | Clock:    0x00000007
sdhci: Int enab:    0xffff4000 | Sig enab: 0xfffc4000
sdhci: AC12 err:    0x00000000 | Slot int: 0x00000000
sdhci: Caps:        0x3f6cd08c | Caps_1:   0x18006f77
sdhci: Host ctl2:    0x0000300d
sdhci: ADMA Err:    0x00000000 | ADMA Ptr: 0x00000000ffee0010
sdhci: ==============================================
------------[ cut here ]----------------
kernel BUG at kernel-source/drivers/mmc/host/cmdq_hci.c:681!
Internal error: Oops - BUG: 9 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 not tainted 4.9.140-l4t-r32.4 #1
Hardware name: storm (DT)
task: ffffff80a151700 task.stack: ffffff90a140000
PC is at cmdq_irq+0x22c/0x238
LR is at cmdq_irq+0x22c/0x238
pc : [<ffffff8008b9db64>] lr : [<ffffff8008b9db64>] pstate: 604001c5
sp: ffffffc1b676ddb0
x29: ffffffc1b676ddb0 x28: 0000000000004000
x27: ffffff800a377000 x26: 0000000000000000
x25: 0000000000040000 x24: 00000000000001c0
x23: ffffffc1aa855140 x22: 0000000000000000
x21: 0000000000000001 x20: 0000000000000004
x19: ffffffc1aa8550c0 x18: 0000000000000000
x17: 0000000000002697 x16: 0000000000000000
x15: 0000000000000010 x14: 3d3d3d3d3d3d3d3d
x13: 3d3d3d3d3d3d3d3d x14: 3d3d3d3d3d3d3d3d
x11: 00000000000002a9 x14: 3d3d3d3d3d3d3d3d
x9: 3d3d203a69636864 x8: ffffff80083c8258
x7: ffffff8008a1944d8 x6: ffffff80083c76b8
x5: 0000000000000000 x4: 0000000000000000
x3: ffffffffffffffff x2: ffffff800a164450
x1: ffffff800a151700 x0: 0000000000000032
Process swapper/0 (pid: 0, stack limit = 0xffffff800a140000)
Call trace:
[ffffff8008b9db64] cmdq_irq+0x22c/0x238
[ffffff8008b964c8] sdhci_irq+0x388/0xd80
[ffffff800811fd18] __handle_irq_event_percpu+0x60/0x280
[ffffff800811ff60] handle_irq_event_percpi+0x28/0x70
[ffffff800811fff8] handle_irq_event+0x50/0x80
[ffffff8008123bcc] habdle_fasteoi_irq+0xc4/0x1a0
[ffffff800811ed4c] generic_handle_irq+0x34/0x50
[ffffff800811f41c] __handle_domain_irq+0x6c/0xc0
[ffffff8008080d2c] gic_handle_irq+0x54/0xa8
[ffffff8008082c28] el1_irq+0xe8/0x194
[ffffff8008b740c8] cpuidle_enter_state+0xb8/0x380
[ffffff8008b74404] cpuidle_enter+0x34/0x48
[ffffff800810febc] call_cpuidle+0x44/0x68
[ffffff80081101fc] cpu_startup_entry+0x18c/0x210
[ffffff8008f1e114] rest_init+0x84/0x90
[ffffff80098e0b38] start_kernel+0x370/0x388
[ffffff80098e0204] __primary_switched+0x80/0x94
---[ end trace 4c5ca253279e0ecb ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
Memory Limit: none

It looks like the micro-processor managing the eMMC have experienced some sort of error which in turn caused the kernel to load the rootfs properly & resulted in a kernel panic.
the jetson have started a boot loop with the same error each time until a power-on reset, which probably have reset the eMMC micro-processor out of his error state.
I would also like to understanding why the eMMC have got into this faulty state

The above basically says what you said originally, but it names servicing the driver for the eMMC. You do need the full boot log. The problem isn’t that we don’t know what the error is…the problem is instead that we don’t know what leads up to the error. Content prior to the failure is important.

I’ve lost the logs prior to the error… & after the power-on reset I haven’t encountered the error again.

Without knowing more about the error I’ve read about eMMC & found out that cmdq (command queue) is an optimization feature introduced in spec 5.1 of eMMC.
If I had the Technical Reference Manual of the jetson’s internal eMMC device I would have looked into the registers values but I don’t have such datasheet.
Since the error occurs during cmdq_irq I’ve decided to completely disable that feature (CONFIG_MMC_CQ_HCI=n in the kernel menuconfig)

The TRM for the chip itself (the SoC) does not mention external memory, so if it is the used memory you are interested in, then you might be out of luck without getting the data sheet of the memory from the memory vendor. However, you can get the TX2 SoC TRM here:
https://developer.nvidia.com/embedded/downloads#?search=trm&tx=$product,jetson_tx2

Someone else would have to tell you where to find the eMMC information, I have no knowledge of this.