Inquiry regarding [NvmExpressDxe] Assertion Error and Read-only File System

Hi all,

Our team recently encountered an assertion error during boot:

ASSERT [NvmExpressDxe] /out/nvidia/bootloader/uefi/Jetson_RELEASE/edk2/MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressHci.c(772): (Priv>Cap.Mpsmin + 12) <= 12

Prior to this error, the system experienced an issue where it began flooding the logs with “Read-only file system” errors, as shown below:

[ 1199.554765] systemd-journald[367]: Failed to write entry (25 items, 670 bytes), ignoring: Read-only file system [ 1199.554851] systemd-journald[367]: Failed to write entry (26 items, 703 bytes), ignoring: Read-only file system

After these logs appeared, we rebooted the device and the aforementioned assertion error occurred. Following that, the system returned to normal operation. Aside from these two logs, there are no other traces or error indicators.

We are unsure if we can reproduce this phenomenon, but we would like to understand what this specific assertion error represents and its potential relationship with the file system turning read-only.

Best regards,

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, it appears that the assertion error during boot on the Jetson AGX Orin is related to the Readonly file system errors that occurred prior to the error. However, without more specific information about the assertion error, it’s difficult to pinpoint the exact cause.

That being said, there are a few potential issues that could be contributing to this problem. One possibility is that the Readonly file system errors are causing the system to become unstable, leading to the assertion error during boot.

In the context, there is a known issue (4617111) related to TSEC CBB errors on Jetson AGX Orin during bootup, shutdown, suspend, and resume when the target is connected to a display with HDCP 2.2 support. This issue is caused by the missing synchronization between Tsec and DCE/display engines. However, it’s unclear if this issue is directly related to the Readonly file system errors and the assertion error.

Another possibility is that the Readonly file system errors are caused by a problem with the file system or storage device. In the context, there is a known issue (4201479) related to flashing Jetson-Linux BSP in multiple boot media like USB and NVMe. If the media has different versions of BSP, it can lead to a system crash in the UEFI. However, this issue seems unrelated to the Readonly file system errors.

In terms of troubleshooting, it might be helpful to investigate the Readonly file system errors further to determine their cause. This could involve checking the system logs for any other error messages, running file system checks, and verifying the integrity of the storage device.

It’s also worth noting that the GPIO sysfs node has been deprecated, and /sys/class/gpio cannot be used to control the GPIO state. Instead, the new GPIO character device API (e.g., libgpiod) should be used. However, it’s unclear if this is related to the Readonly file system errors and the assertion error.

In comparison to other issues mentioned in the context, the Readonly file system errors and the assertion error seem to be a unique problem. However, there are some similarities with other issues, such as the TSEC CBB errors, that could be worth investigating further.

In summary, the assertion error during boot on the Jetson AGX Orin is likely related to the Readonly file system errors, but the exact cause is unclear. Further investigation is needed to determine the root cause of the problem and to develop a solution.

Recommendations:

  1. Investigate the Readonly file system errors further to determine their cause.
  2. Check the system logs for any other error messages.
  3. Run file system checks to verify the integrity of the storage device.
  4. Verify the integrity of the storage device.
  5. Consider using the new GPIO character device API (e.g., libgpiod) instead of the deprecated GPIO sysfs node.
  6. Investigate the TSEC CBB errors and determine if they are related to the Readonly file system errors and the assertion error.

By following these recommendations, you may be able to determine the cause of the assertion error and develop a solution to prevent it from occurring in the future.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Is this still an issue to support? Any result can be shared?

Yes, this issue remains unresolved. However, since there is no urgent pressure from our client at the moment, it is currently a lower priority for us. That said, we are still very much looking forward to any helpful insights or solutions from anyone who might have encountered this.

Hello there,

Do you use a A/B slot system? We have had this exact same assertion being triggered and when this occurs, a boot swap (A->B or B->A). Have you seen this behavior?

For ref: Orinnx reboot repeatly but it switch to B unexpectly - #102 by sebastien.schertenleib

Yes, I read your post, and it looks very similar to the issue we are experiencing. The machine currently having this problem is an older model, but we definitely can’t rule this out. We just received the new machines recently, and since the previous units had stabilized, stress testing wasn’t initially in our plans. However, we will be conducting stress tests on them soon. Thanks again for spotting this ticking time bomb!

Hi @yihang.li do you use Jetpack 6.2.2 r36.5?

And please help share

  1. Full UART log
  2. Prints of $ sudo i2cdump -f -y 0 0x50
  3. Is the NVMe SSD connected to PCIe C4 interface?
  4. Is rootfs on eMMC or the NVMe SSD

Apologies for the silence; I’ve been away on long-term leave recently.

Regarding this issue, we can consider it closed for now. The customer only encountered this once, and it hasn’t reoccurred since, so I unfortunately cannot provide any further error logs at this moment.

I will continue to track the thread [quote=“sebastien.schertenleib, post:6, topic:368402”] Orinnx reboot repeatly but it switch to B unexpectly - #102 by sebastien.schertenleib [/quote]

and work on potential preventative measures on our side. Thanks for your help.