Multiple Boot Driver Corruption

We are developing a commercial device on the Jetson Dev Kit and in prototyping and small field testing are noticing that several units are getting various corruption events at the Jetson driver level.

Some logs are attached of three such units.

We fixed the units using SDKManager - but need to find the cause of why this is occurring so it does not occur in field.

Is this a power issue? When does it occur?

jetson03-ttl.log (4.4 KB)
jetson02-ttl.log (7.8 KB)
jetson01-ttl.log (6.9 KB)

Case 1 and 2 are partition corrupted, which could only be reflashed.

https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/bootloader_update_nano_tx1.html#

Case3 is something unknown. Did you remember how you reproduce this error?

Wayne - I think there may be a misunderstanding.

I’m aware we need to ‘reflash’ to save the boards - but how does Case 1 and Case 2 happen? What causes this?

Case 1, 2, and 3 all happened without us touching the drivers/boot code. This happened during normal development.

What does the dev team think causes these issues?

So are these purely dev kits? All SD card models? If so, was there any kind of experimentation with backup partitions (I ask because SD card models don’t support this, and some of the log mentions QSPI…which is not necessarily an error if the software just tests for QSPI and it isn’t really using QSPI). Regardless, whatever is going on, it is at least looking for failover partitions (which is not necessarily a problem since not setting up failover would mean failover fails and it isn’t part of the problem). Even case 3 might be related through wanting a secure setup in boot (keep in mind signatures must be valid even if it is the default NULL signature used when custom signing was not used). Basically I’m just wondering if these are purely dev kits, and if there was anything custom about boot setup.

Hi LinuxDev,

These are purely dev kits - all sd card models - no experimentation at all with the boot side. All we were doing was playing around on the SD card side.

The only setup we had that was at all experimental was we had them on UPS battery packs - outside of that - purely the dev kits.

What kind of playing around with the SD card? Were any partitions involved other than rootfs?

The first few Jetsons we set up had micro SD cards flashed with the stock nvidia image using Etcher on a Mac. As we figured out the packages we needed installed one of our devs made our own image. Our custom image wasn’t bootable if it was flashed with Etcher on a Mac (OS boot issues) but could successfully boot if we flashed our image if it was flashed with Etcher on a Windows machine. The system we made the custom image from has not run into any bootloader issues, while the boards we have had bootloader issues were a mix of cards flashed with the stock NVIDIA image and our custom image. We had a couple iterations of custom images before we realized it was an issue with flashing on a Mac, but to my knowledge we haven’t changed anything about the partitions.

Directly copying an SD card image to an SD card should be ok regardless of it being a Mac, Linux, or Windows (a clone is a bit-for-bit exact copy). What comes to mind as a possible issue is if the partitioning tools or filesystem tools were used from another platform. Even if those things seem to be a match there may be options which differ for defaults. Also, swapping GPT partition methods for old style BIOS tools (e.g., gdisk is GPT, fdisk is old style BIOS) could alter function in unexpected ways.

When you say a “mix”, do you mean some SD cards purely with the NVIDIA image, and others purely with a custom image, or do you mean both unmodified and modified partitions on the same SD card?

Narrowing it down, are cards which fail ones which have had both custom and standard images on at different times? Does the error follow cards with a history of both partitions? Any details on what has “followed” the problem might help, including which tools have touched the card.

To answer one of your questions - the SD cards have never been dual-partitioned.

We can start tracking exactly what we are doing on these dev kits so we can provide more helpful insight going forward if necessary. What would you like us to track? Things like partitioning, image use, driver installs, etc?

Our custom image is mostly application code as well as a few device level drivers for other components we control with the jetson (intel camera is the major one).

Best,

Yes, any kind of custom partitioning information is useful, and what tool is used for the partitioning. Creating a filesystem only indirectly matters, and I doubt this is the problem since it is showing up on binary partitions. Tools like gdisk (or the incompatible old style fdisk) might have issues since they deal with partitions and not the data inside of the partition.

For custom images it would be good if you could give a detailed example of how this is done (it doesn’t need to show all changes, but for example, you could include if a package is added with apt via QEMU, or natively and then cloned, so on; also note the exact size of any partition (use “sudo blockdev --getsize64 /dev/WhateverPartition” on every partition on the disk; note if you modified any XML file before a flash to reflect size changes).

In the case of device drivers I doubt it would matter unless they are related to direct disk access (e.g., some high performance databases like Oracle will access a disk directly without going through the filesystem). You might mention the nature of those drivers if you think any might be related in some way.

I don’t know of any way to clone non-rootfs/non-APP partitions, but if NVIDIA can offer a way to do so, then it might help since there is no way to see partition sizes or content in a non-bootable Jetson (e.g., QSPI content for the SD card version, or eMMC non-rootfs partitions in the eMMC models).