We have a custom Xavier AGX system with JetPack 4.6 on the eMMC, that fails to boot. Here is the log file: log_boot_loop.txt (49.1 KB)
Booting from an external USB, we checked the mmc health: mmc_log.txt (10.3 KB)
In similar cases, you suggest to RMA the device:
Before doing so, we would like to know if we can check anything further to find out, why the system came into this status?
Also, if we reflash the device and it boots from the eMMC, how can we make sure that the eMMC is okay?
Thank you.
I can’t answer that, but I can give you a very important test: Try to clone. The carrier board shouldn’t matter during a clone. If it succeeds, then it can at least read the eMMC for the rootfs partition. Success also means a backup of exactly what the rootfs has at the time of failure. Clones give both a “raw” clone (bit-for-bit exact copy of the partition) and a “sparse” clone (does not contain empty space, faster to flash with, but not useful for examination…as the filesystem fills the size of the sparse clone approaches the size of the raw clone).
If a clone succeeds or fails it tells you a lot about the health of the eMMC across at least part of its address range. Just be sure to monitor “dmesg --follow” during the clone on the host PC in case of errors since it might add clues.
Thank you for your answer.
We tried to clone the image. Reading the eMMC was successful, even if it felt really slow. But when we try to flash the read image to another device, it gets stuck at 0% writing the APP partition. In the dmesg we only get the message that the tegraflash process was blocked for more than 120 seconds.
Incidentally, a cloned image should flash to any Jetson regardless of whether the image is valid or not (though it is possible that the size of the image can cause errors if a different size is expected). It could be random bits, all NULL bytes, so on, and the flash would work (it sure would not boot though). The fact that it won’t flash to another unit says something is wrong with the hardware or firmware. I don’t think device trees matter a lot during flash, so I think odds are that it is a hardware design error shared on the different Jetsons. About the only other possibility is if your flash software is corrupt, or perhaps if you are using a VM.
We flash from the same host many other AGX Xavier with the same carrier board without any issues. Ok WayneWWW, we will take out the module and test it with the DevKit, even though the result probably will be the same.