Intermittent boot crashes/issues - NVME/squashfs

Hello.

We are having intermittent boot crashes/issues.
Most of the time the system boots just fine, however - every once in while (estimation: every ~20/30 boot attempt or so) the boot fails and the system reboots.

Custom carrier board with PCIe/M.2 NVME disk.

Booting from/OS is on NVME disk.

Similar to unanswered topic: Kernel panic on boot [TX2i PREEMPT]

See files for boot logs:
AGX - NVME Boot crash squashfs.txt (17.9 KB)
AGX - NVME Boot crash squashfs - Reset.txt (37.7 KB)

Do you have any idea on what could be the culprit - or assistance in any way, we would be thankful.

// Tobias

Hi tobias.johansson1,

[   61.521176] Hardware name: Jetson-AGXi (DT)

Are you using AGX Xavier Industrial?

What’s your Jetpack version in use?

Is this issue happening when you are booting from internal eMMC?

Have you tried to re-flash the board or using another NVMe disk and still hit the same issue?

Hello. Thanks for the response.

Yes, AGX Xavier Industrial is correct. Should have added that information in the initial post, sorry.

Jetpack version 32.7.X

Not that we have seen, have booted/shutdown for more than 3500 times when booting from internal eMMC in a test rig without seeing this issue.
However, that setup was a bit different - with not using LUKS/squashfs etc.

Compliance test for all PCIe lanes to the NVME disk have been performed with more than satisfactory results.

Do you mean that you use LUKS/squashfs for NVMe but not for internal eMMC?

Have you tried with the same setup for NVMe as internal eMMC?
It could help us narrow down the issue.

Could you also share the steps how you setup for NVMe and check if the issue could be reproduced on the devkit?

Do you mean that you use LUKS/squashfs for NVMe but not for internal eMMC?

Correct.

Have you tried with the same setup for NVMe as internal eMMC?
It could help us narrow down the issue.

No. Sadly I don’t see how we could pull that of.

Could you also share the steps how you setup for NVMe and check if the issue could be reproduced on the devkit?

Also no, as we do not know the specifics of how it is set up (encryption, volumes, partitions, squashfs, EXT4 etc).

So I guess all these answers where no good. But it is the best I could give you.

Just as a thought - we know that the current setup uses the hardware de-/encryption on the AGX a lot. Are there any known limitations/bugs regarding that one?

Thanks.

So, how do you get this encrypted NVMe? Is that done by your customer?

We’ve not tested or verified such the use case of squashfs so that I’m asking for any setup or reproduce steps. It seems not the official supported feature.

Could you also help to provide the dtb in use for further check?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.