M.2 NVMe sometimes not detected - removing nvme from boot-order helps

Dear NVidia Team

We opened already another topic on this issue, which was automatically closed:

We see NVME errors on our Xavier NX and our Xavier AGX systems.

Now we changed on our Xavier NX carriers the boot-configuration by removing the NVMe, and so far the SSD gets always correctly detected. Do you have any idea what is the cause?
We have some other questions to this workaround:

  1. It seems that flashing the CPUBL-CFG is not working (with the “-k” option), the cbo.dtb only takes effect when flashing the whole system. Why is that? Is there a possibility to change the boot-order without flashing the whole device?

  2. If we do change this, our systems cannot be booted anymore from the NVMe, which might be necessary. Any other idea what we could try? What does the mentioned TX2-NX patch should resolve?

We will in the meantime try to add a delay before the pcie probing and see if that helps.
Thank you for your help.

Best regards

Sorry that I just want to make a brief summary here…

Are you saying that if cboot does not touch your nvme drive, then kernel is able to detect it?

If so , could you tell which jp version you are using? Seems not the latest one.

Hi WanyeWWW

Thank you for your answer.
Exactly, we see that if we remove NVMe from the boot-configuration, it always gets detected. If the bootloader has it included as a boot device, sometimes the NVMe is not detected with “probe failure”.
We mainly test this with JetPack 4.6, but we also have seen probe failures with JetPack 5.0.2. Adding a delay into the kernel pcie tegra driver did not help.

Hi @sevm89

Please test with jp4.6.1 (32.7.1) first because there are some pcie patch to the bootloader.

As for jp5, because the bootloader is totally changed, this needs more investigation.

@sevm89

Could you also share the boot log of your jp5.0.2 case?

Also, what would happen if you boot from nvme drive but it fails to probe in kernel? It would go into initrd (bash shell)? Is that the problem you are talking about?

Hi WayneWWW

As for our customers, it is not so easy possible to switch from JetPack 4.6 to JetPack 4.6.1, would it be possible to have patches for JP4.6?
So far we tested booting from eMMC and mounting NVMe with an entry in fstab. The system goes into the console as the NVMe cannot be mounted when a probe failure occurs.
We will check to get the boot log for JetPack 5.0.2.

Hi,

Are they able to test this and reproduce issue on devkit?

Or you can directly copy the cboot from jp4.6.1 and flash the new cboot to your system and see if issue is still.

Hi WanyeWWW

It seems that the bootloader from jp4.6.1 solves the issues.
Do we have to expect any compatibility problems when we install just the bootloader from JP4.6.1 while all other components are from JP4.6? We flashed the bootloader with:

./flash.sh -r -k cpu-bootloader jetson-xavier-nx-emmc mmcblk0p1

Can we update the bootloader also on a running system itself?

We do now further test with JP5.0.2 and see if we can get the boot log in case of an error.

Hello,

I believe it should be resolved by these two patches in rel-32.

1b5b11a.diff (1).zip (2.8 KB)
9eeaf45.diff (1).zip (1.5 KB)

Could you share which kind of NVMe drive disk you are using to reproduce this issue? and devkit can reproduce this issue, right?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.