System not fully booting and becoming unresponsive after boot — mmc0 CQE recovery errors

The system intermittently fails to boot or becomes unresponsive after boot. When it fails, the serial console still outputs logs but no command-line access is possible. When it does boot, the system eventually hangs, and the console shows repeated mmc0 CQE recovery and CBB-fabric SLAVE_ERR traces.

Serial log excerpt (failure starts around line 2461, timestamp 32.242269):

[   32.242269] mmc0: running CQE recovery
[   32.247088] mmc0: cache flush error -110
[   32.328244] arm-smmu 8000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, ...
...
[  462.529645] mmc0: cqhci: CQE failed to exit halt state

Hardware: Jetson AGX Orin

Software: JetPack 5.1.2 (L4T 35.4.1)

Issue frequency: Frequent

Attachments:error_log.txt (186.1 KB)

Any guidance would be greatly appreciated.

Thank you.

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, it appears that the issue you’re experiencing with the Jetson Orin NX is related to a combination of hardware and software problems. Here’s a comprehensive analysis of the possible causes and potential solutions:

Hardware-related issues:

  1. Thermal issues: The repeated CQE recovery and CBBfabric SLAVEERR traces in the console logs suggest that the system might be experiencing thermal-related problems. Check the thermtrip of ORIN NX (mentioned in the context) to see if it’s related to your issue.
  2. Power and reset sequence: Ensure that the power and reset sequence is correct, as mentioned in the Bring-up Hardware Validation section. Verify that the power rails are stable and within the recommended voltage range.
  3. JTAG connection: Check the JTAG connection, as mentioned in the Bring-up Hardware Validation section, to ensure that it’s properly connected and configured.

Software-related issues:

  1. Boot validation: Verify that the boot validation process is successful, as mentioned in the Boot Validation section. Check the kernel device tree, boot scripts (bootcmd), and FS support to ensure that they’re correctly configured.
  2. Secure Boot: As you’re using JetPack 5.1.2 and L4T R35.4.1, ensure that Secure Boot is properly configured. Check the Xavier doesn’t boot after Secure Boot flash (JetPack 5.1.2 / L4T R35.4.1) issue in the context to see if it’s related to your problem.
  3. OTA updates: If you’ve recently performed an OTA update, check the Can’t Image-Based OTA update after I patch my tos-img in bootloader issue in the context to see if it’s related to your problem.

Comparison with Jetson AGX Orin:

While the context provides information on both Jetson Orin NX and Jetson AGX Orin, the issues mentioned seem to be specific to the Jetson Orin NX. However, it’s worth noting that the Jetson AGX Orin has similar issues, such as the A/B Rootfs Redundancy: No fallback to other slot when one gets corrupted issue, which might be related to the boot problems you’re experiencing.

Guidance and recommendations:

  1. Check the thermal design: Ensure that the thermal design of your system is adequate to prevent overheating.
  2. Verify the power and reset sequence: Double-check the power and reset sequence to ensure that it’s correct.
  3. Check the JTAG connection: Verify that the JTAG connection is properly connected and configured.
  4. Boot validation: Verify that the boot validation process is successful and that the kernel device tree, boot scripts (bootcmd), and FS support are correctly configured.
  5. Secure Boot: Ensure that Secure Boot is properly configured and that the Xavier doesn’t boot after Secure Boot flash (JetPack 5.1.2 / L4T R35.4.1) issue is not related to your problem.
  6. OTA updates: If you’ve recently performed an OTA update, check the Can’t Image-Based OTA update after I patch my tos-img in bootloader issue to see if it’s related to your problem.

By following these steps and recommendations, you should be able to identify and potentially resolve the issues causing the intermittent boot failures and system hangs on your Jetson Orin NX.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

Please install latest version like 5.1.5 or 6.2.1 and cross check whether issue exists.

Thanks

HI @DavidDDD

We have been using JP 5.1.2 across multiple Orin AGX SoMs, but this issue has started occurring recently and is now observed on several SoMs. After re-imaging the affected SoM, the issue has not reappeared. However, another SoM began showing similar behavior shortly after—it fails to boot about 90% of the time, though it operates normally when it does boot. We are using DeepStream to run edge detection algorithms on the AGX Orin for our use case.

This issue has not been observed on any SoMs running JetPack 6.1.

Hi @DavidDDD

Is there any update on this issue? We will be continuing with JetPack 5.1.2 (L4T 35.4.1) for our use. The problem is critical since multiple AGX Orin SoMs on JP 5.1.2 are showing this behaviour, and resolving it is essential for us.

Hi,

Is your Orin NX using a custom board or developer kit?

Could it also be reproduced using the developer kit?

Thanks

Hi

We are using a Jetson AGX Orin module with a custom carrier board.

Hi,

Please replace it with our developer kit and verify whether the issue still exists.

Thanks