Mmc0 error cause system to freeze

Module: Jetson AGX Orin 32G, with emmc DG4064

software: Jetpack-6.0

Sometimes We met a mmc0 error, which could cause system to freeze, it will automatically reboot lately.

error log:

[59749.465257] mmc0: Timeout waiting for hardware interrupt.
[59749.465265] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[59749.465268] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00000505
[59749.465272] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[59749.465275] mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[59749.465278] mmc0: sdhci: Present:   0x01fb00f0 | Host ctl: 0x00000001
[59749.465280] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[59749.465283] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x0000190f
[59749.465286] mmc0: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
[59749.465289] mmc0: sdhci: Int enab:  0x00ff0003 | Sig enab: 0x00fc0003
[59749.465292] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[59749.465295] mmc0: sdhci: Caps:      0x3f6cd08c | Caps_1:   0x18002f73
[59749.465298] mmc0: sdhci: Cmd:       0x00000000 | Max curr: 0x00000000
[59749.465301] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
[59749.465304] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[59749.465305] mmc0: sdhci: Host ctl2: 0x00001000
[59749.465309] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x0000007ffffe8890
[59749.465310] mmc0: sdhci: ============================================
[59749.465516] blk_update_request: I/O error, dev mmcblk0, sector 32510056 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
[59749.465532] EXT4-fs error (device mmcblk0p1): __ext4_find_entry:1682: inode #961688: comm kworker/u16:12: reading directory lblock 0
[59749.466592] CPU:0, Error: cbb-fabric@0x13a00000, irq=191
[59749.466596] **************************************
[59749.466597] CPU:0, Error:cbb-fabric, Errmon:2
[59749.466602] 	  Error Code		: SLAVE_ERR
[59749.466603] 	  Overflow		: Multiple SLAVE_ERR
[59749.466610] 
[59749.466611] 	  Error Code		: SLAVE_ERR
[59749.466612] 	  MASTER_ID		: CCPLEX
[59749.466612] 	  Address		: 0x3460008
[59749.466613] 	  Cache			: 0x1 -- Bufferable 
[59749.466614] 	  Protection		: 0x2 -- Unprivileged, Non-Secure, Data Access
[59749.466615] 	  Access_Type		: Write
[59749.466616] 	  Access_ID		: 0x0
[59749.466617] 	  Fabric		: cbb-fabric
[59749.466617] 	  Slave_Id		: 0x3a
[59749.466618] 	  Burst_length		: 0x0
[59749.466619] 	  Burst_type		: 0x1
[59749.466619] 	  Beat_size		: 0x2
[59749.466620] 	  VQC			: 0x0
[59749.466620] 	  GRPSEC		: 0x7e
[59749.466621] 	  FALCONSEC		: 0x0
[59749.466622] 	  Slave			: AXI2APB_8
[59749.466623] 	**************************************
[59749.466644] WARNING: CPU: 0 PID: 21042 at drivers/soc/tegra/cbb/tegra234-cbb.c:608 tegra234_cbb_isr+0x144/0x190
[59749.466867] ---[ end trace fc92b6e3dad6471c ]---

console-ramoops-0.txt (512.0 KB)

It seems to be the same problem in this topic AGX Orin 64G(DG4064型号eMMC)版本使用中出现mmc0故障报警导致系统卡死, which mentioned update to jetpack-6.1 can solve this, but I want to know which patch between jetpack-6.0 and jetpack-6.1 can work.

Is this on custom board or NV devkit?

Hi wayne

Is custom board

Is this issue possible to reproduce on NV devkit? Actually I don’t see anything that fixed on our side regarding to this issue.

Hi Wayne

This issue only occurs occasionally, we only have one devkit, have not met this issue.

I tried to reboot devkit repeatedly to reproduce issue, but system corrupted after 3 reboots, see “Repeated reboot at slot A in jetson orin with rootfs A/B=1, but boot to slot B unexpectedly, so sad :(

Dear Waynne:

https://forums.developer.nvidia.com/t/mmc0-error-cause-system-to-freeze/349780

这个客人在自己的板子上遇到的情况,是否可以给些建议,比如需要再提供什么信息,或者可以做哪些test?

盼回复,谢谢!

Alice

Hi,
We would suggest upgrade to latest Jetpack 6.2.1 r36.4.4, to have latest software for Orin series.