AGX Orin 64G 在进行压力测试报错mmc0: Timeout waiting for hardware interrupt

多片AGX Orin 64G jetpack6.2 version 36.4.3,在做压力测试,有的在reboot 200次左右出现报错,有的在1000次才会出现报错,测试没接SD卡。

请问需要检查什么。谢谢!

mmc0: running CQE recovery
[ 20.414863] mmc0: cqhci: Failed to halt
[ 20.415943] CPU:0, Error: cbb-fabric@0x13a00000, irq=192
[ 20.415956] **************************************
[ 20.415958] CPU:0, Error:cbb-fabric, Errmon:2
[ 20.415966] Error Code : SLAVE_ERR
[ 20.415969] Overflow : Multiple SLAVE_ERR
[ 20.415978]
[ 20.415980] Error Code : SLAVE_ERR
[ 20.415982] MASTER_ID : CCPLEX
[ 20.415984] Address : 0x3460008
[ 20.415987] Cache : 0x1 – Bufferable
[ 20.415991] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[ 20.415996] Access_Type : Write
[ 20.415998] Access_ID : 0x0
[ 20.416000] Fabric : cbb-fabric
[ 20.416002] Slave_Id : 0x3a
[ 20.416004] Burst_length : 0x0
[ 20.416006] Burst_type : 0x1
[ 20.416008] Beat_size : 0x2
[ 20.416010] VQC : 0x0
[ 20.416011] GRPSEC : 0x7e
[ 20.416013] FALCONSEC : 0x0
[ 20.416016] Slave : AXI2APB_8
[ 20.416018] **************************************
[ 20.416048] WARNING: CPU: 0 PID: 76 at drivers/soc/tegra/cbb/tegra234-cbb.c:608 tegra234_cbb_isr+0x144/0x190
[ 20.416567] —[ end trace a1738f59a1542131 ]—
[ 30.430842] mmc0: Timeout waiting for hardware interrupt.
[ 30.430857] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 30.430863] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00000505
[ 30.430870] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x000000f8
[ 30.430876] mmc0: sdhci: Argument: 0x400e0008 | Trn mode: 0x00000033
[ 30.430881] mmc0: sdhci: Present: 0x11fb00f0 | Host ctl: 0x00000039
[ 30.430885] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
[ 30.430889] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000000f
[ 30.430894] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000
[ 30.430898] mmc0: sdhci: Int enab: 0x00ff0003 | Sig enab: 0x00fc0003
[ 30.430902] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 30.430906] mmc0: sdhci: Caps: 0x3f6cd08c | Caps_1: 0x18002f73
[ 30.430911] mmc0: sdhci: Cmd: 0x00002c1e | Max curr: 0x00000000
[ 30.430915] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x40001fae
[ 30.430920] mmc0: sdhci: Resp[2]: 0x30468000 | Resp[3]: 0x00000240
[ 30.430923] mmc0: sdhci: Host ctl2: 0x0000300d
[ 30.430927] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000007ffffe7840
[ 30.430930] mmc0: sdhci: ============================================
[ 30.432738] CPU:0, Error: cbb-fabric@0x13a00000, irq=192

boot_failed.txt (476.7 KB)

*** Please note that this reply is generated by LLM automatically ***

The provided error log appears to be related to a boot failure on a Jetson AGX Orin device, specifically the 64G model running Jetpack 6.2 version 3.6.4.3. The log indicates a series of errors and warnings that ultimately lead to a boot failure.

Here’s a breakdown of the key errors and warnings:

  1. mmc0 running CQE recovery: This line suggests that the device is attempting to recover from a previous error related to the SD card (mmc0).
  2. mmc0 cqhci Failed to halt: This line indicates that the SD card controller (cqhci) failed to halt, which is a critical step in the boot process.
  3. CPU0 Error cbbfabric0x13a00000 irq192: This line reports an error on the CPU0 related to the cbbfabric (a hardware component) and an IRQ (interrupt request) 192.
  4. Error Code SLAVEERR: This line indicates a slave error, which is a type of error that occurs when a hardware component (in this case, the cbbfabric) fails to respond correctly to a request.
  5. MASTERID CCPLEX: This line suggests that the error is related to the CCPLEX (a hardware component) and its interaction with the cbbfabric.
  6. Address 0x3460008: This line reports the address where the error occurred.
  7. Cache 0x1 Bufferable: This line indicates that the error occurred in a bufferable cache.
  8. Protection 0x2 Unprivileged NonSecure Data Access: This line suggests that the error occurred due to an unprivileged non-secure data access.
  9. AccessType Write: This line indicates that the error occurred during a write operation.
  10. SlaveId 0x3a: This line reports the ID of the slave device that caused the error.
  11. Burstlength 0x0: This line indicates that the error occurred during a burst operation with a length of 0.
  12. Bursttype 0x1: This line reports the type of burst operation that caused the error.
  13. Beatsize 0x2: This line indicates the size of the beat (a unit of data transfer) that caused the error.
  14. VQC 0x0: This line reports the value of the VQC (a hardware register) at the time of the error.
  15. GRPSEC 0x7e: This line reports the value of the GRPSEC (a hardware register) at the time of the error.
  16. FALCONSEC 0x0: This line reports the value of the FALCONSEC (a hardware register) at the time of the error.
  17. Slave AXI2APB8: This line indicates that the error occurred on the AXI2APB8 slave device.
  18. WARNING CPU 0 PID 76 at drivers/...: This line reports a warning on CPU0 related to a driver issue.
  19. mmc0 Timeout waiting for hardware interrupt: This line indicates that the SD card controller timed out waiting for a hardware interrupt.
  20. mmc0 sdhci SDHCI REGISTER DUMP: This line reports a dump of the SDHCI (Secure Digital Host Controller Interface) registers.

Comparing this error log to other similar logs, it appears that the issue is related to a hardware problem on the Jetson AGX Orin device, possibly related to the SD card or the cbbfabric component. The error log suggests that the device is experiencing a series of errors and warnings that ultimately lead to a boot failure.

To troubleshoot this issue, it may be helpful to:

  • Check the SD card for any issues or corruption
  • Verify that the cbbfabric component is functioning correctly
  • Check the device’s hardware configuration and ensure that it is set up correctly
  • Try booting the device with a different SD card or configuration to see if the issue persists
  • Consult the device’s documentation and support resources for further guidance on troubleshooting and resolving the issue.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Could you move test to NV devkit and test again?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.