Hello. I’ve been experiencing corruptions on multiple types and brands of SD cards on the Orin Nano developer kit. The corruptions happen after safe shutdowns or reboots, without unplugging the power cable. My workflow involves copying a new kernel and modules via SSH (scp and rsync), running the sync command, and then the reboot command. I get corruptions every so often and it’s very frustrating having to reflash the board each time.
I can see similar topics on this forum such as:
@linuxdev mentioned grabbing some logs in that topic, so here they are.
The point here is to clarify the error situation first
Check if pure image would hit such issue or not
Confirm if this issue is related to your patch.
Because this is filesystem corrupted, but yoru patch is related to v4l2. Sounds a little bit irrevalant.
That is why we need to clarify if this is really related to your kernel or even the pure image can hit issue.
There are a few more topics that report this issue. Do you really think the problem is on my side? Checking using pure image is something someone from Nvidia could do too, since there have been multiple reports of this happening.
The issue probably happens if there is some file system interaction. My guess is that you stress-tested reboots, but didn’t do any file system operation in-between.
You could probably replicate this by writing a few hundred megabytes to the SD card in-between reboots.
copy kernel Image, copy modules, copy dtbs → reboot → open qv4l2 app, check if camera has image, rinse and repeat. All disk operations involved are just copying over SSH. If you want the exact source code I can link it to you but I don’t think it makes a difference.
In addition to a full serial console boot log (which is what @WayneWWW is asking for), I did see something of interest in one of the logs I have a question for you on:
Aug 31 23:57:51 ubuntu kernel: [ 171.394958] EXT4-fs (mmcblk1p1): resizing filesystem from 14417920 to 31016960 blocks
Aug 31 23:57:51 ubuntu dhcpd[9136]: DHCPDISCOVER from 9e:3b:1b:13:2d:3a via l4tbr0: network 192.168.55.0/24: no free leases
Aug 31 23:57:52 ubuntu kernel: [ 172.325350] EXT4-fs (mmcblk1p1): resized filesystem to 31016960
Aug 31 23:57:53 ubuntu nv-late-init.sh[9529]: Filesystem at /dev/mmcblk1p1 is mounted on /; on-line resizing required
Aug 31 23:57:53 ubuntu nv-late-init.sh[9529]: old_desc_blocks = 7, new_desc_blocks = 15
Aug 31 23:57:53 ubuntu nv-late-init.sh[9529]: The filesystem on /dev/mmcblk1p1 is now 31016960 (4k) blocks long.
This log would occur upon first boot after an installation. The SD card will try to resize only on the first boot whereby there is more space on the SD card. Is the corruption always after a flash? Or have a few reboots occurred prior to the corruption? I’m thinking this log was just showing that from some previous boots, in which case it isn’t relevant, but if this is from a recent flash, then it is relevant.
The other question is how was the SD card prepared? I’m assuming this is the rootfs (o/s) running on the SD card (there wouldn’t be any eMMC on an Orin Nano dev kit), but you could have generated this from flash software, or it could have been taken from a preexisting image which was put on the SD card. If the image this is taken from is itself corrupt, then then SD card and boot wouldn’t actually be the cause of corruption (one can loopback test an image to see if it is corrupt). If the image was generated, then it should still be on the Linux PC and that can be loopback tested. Should the images this is created from show as not corrupt, then the corruption has to be from the Jetson. Should the Jetson be the cause, then knowing if the corruption is tied to first boot (which is what the log excerpt is from, but it might not have been a recent boot), then the cause is different than if the corruption is from a later boot.
The goal of a serial console log is to catch what causes the corruption as well as when the boot detects corruption. If for example there is a software failure during a normal shutdown, that means the serial console will have to contain at least the shutdown content from the previous boot. One could boot up normally, if there is no corruption yet, start the serial console log, and reboot (which should then catch the corruption in boot stages which dmesg won’t show).
Incidentally, when things are working, what is the output of “df -H -T” and “lsblk -f”? If resizing failed I would expect a different result than if resizing has succeeded.