NVMe sometimes lost on reboot - pcie_aspm=off influence

Dear NVidia Team

We have a custom Orin NX carrier board where we see the issue, that sometimes, during the execution of initrd after a reboot, the nvme0n1p1 is not found and the system boots into a bash shell. Here the log files for the non-working and the working case:
dmesg_not_working.txt (41.3 KB)
dmesg_working.txt (66.6 KB)

We see the following error in the not working case:

[ 9.023576] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x11, cbfrsynra=0x1013, cb=2
[ 9.023831] tegra-mc 2c00000.memory-controller: pcie4w: secure write @0x00000003ffffff00: VPR violation ((null))

After removing the Kernel argument “pcie_aspm=off” which we used so far, the NVMe is always recognized.
We are testing JetPack 6.0 GA with the standard kernel.
Any idea why this kernel argument leads to this issue?
Thank you.

Kind regards

I guess even the nvme drive itself is related to this issue.

Could you also try other brand of nvme SSD too? And please share us what is the one you are using now.

Also, does this issue happen on NV devkit?

Dear WayneWWW

The SSD is an Apacer PV920-M280. We will check on the DevKit and also different SSDs and come back to you.
Thank you.

1 Like

Is this still an issue to support? Any result can be shared?

Hi kayccc

Sorry for the late answer.
We just checked today with the same SSD on the DevKit and we can see the same issue.
Using a different brand of NVMe SSD, we do not see this behavior.

Best regards