Xavier FUBAR after editing fstab

I’m beyond comprehension how this is even possible not to repair such small problem. I really need help to edit the fstab to get it back up.

  • Installing a NVMe SSD is easy. Also recognized, and can finally overcome the irritations of disk being full continuously.
  • After editing /etc/fstab to mount the /opt to the new SSD, I run a check if all is ok. It gives warnings on the existing lines, so I think I can safely ignore.
  • It is very weird that I see Microsoft (???) filesystems? But it’s the standard stuff from sdkmanager.
  • Unfortunately it boots into EUFI-mode and does not want to boot from the eMMC
  • Going to FS3: and then trying to edit /etc/fstab results in an error suggesting the drive is not writable. The added line shows without any tabs (edit: minicom-problem, as picocom shows it).
  • Trying to copy files to the new SSD (ext4) also gives errors - assuming write-protected or no support for ext4. Same for moving to a USB with ex-FAT. But if the base-system cannot be edited in recovery mode, what then?
  • pressing exit and then trying to boot from the eMMC goes to a black screen and returns immediately.

Conclusion: Nvidia designed the system that cannot be fixed, when fstab is edited??? The default terminal color-settings are even blue on black, and have troubles reading the screen - even that was not thought through.

After a night of sleep, thinking it through and reading many entries on this forum, I decide to give up and start over.

  • sdkmanager I have to install on this laptop, and I notice that still all irritating stuff is there as a year ago. Just one example: it took me 5 minutes to get the location to where I wanted it - while it created directories on all the wrong spots while I said ‘no’.
  • But the real problem is: the board simply does not go into recovery mode. Long-press, short press, via reset or power - nothing makes the nvidia-entry show up in lsusb. Seemingly a working install is needed? I can’t remember what I did last time, but do remember it was very time-consuming. If I check the output with or without recovery, there is no output on ttyUSB3 with picocom - so something is different…

I am beyond comprehension what is going on, and why it has to be so hard to recover from this.

This is the last logs before it enters the boot-menu.

[0002.677] I> Boot-device: eMMC
[0002.678] I> Boot_device: SDMMC_BOOT instance: 3
[0002.681] I> sdmmc-3 params source = boot args
[0002.684] W> No board IDs available
[0002.686] E> Failed to get board id info!
[0002.690] I> sdmmc bdev is already initialized
[0002.694] I> sdmmc-3 params source = boot args
[0002.701] I> Found 20 partitions in SDMMC_BOOT (instance 3)
[0002.706] I> Found 44 partitions in SDMMC_USER (instance 3)
[0002.729] I> enabling ‘vdd-hdmi-5v0’ regulator
[0002.736] I> regulator ‘vdd-hdmi-5v0’ already enabled
[0002.737] E> tegrabl_display_init_regulator: hdmi cable is not connected
[0002.737] E> tegrabl_display_get_pdata, failed to parse dtb settings
[0002.738] E> cannot find any other nvdisp nodes
[0002.739] E> no valid display unit config found in dtb
[0002.743] W> display init failed
[0002.744] initializing target
[0002.747] calling apps_init()
[0002.750] starting app kernel_boot_app
[0002.753] I> Kernel type = Normal

Jetson UEFI firmware (version 3.1-32827747 built on 2023-03-19T14:56:32+00:00)

The board is a P2972, so the Jetson AGX Xavier Developer Kit.

Any help is welcomed, to unbrick my expensive dev-kit.

  • What was the fstab entry you added?
  • Was this using only eMMC prior to this?
  • Did you change anything in “/boot/extlinux/extlinux.conf”?
  • Can you attach two full serial console boot logs:
    • One with the NVMe attached.
    • One without the NVMe attached.

FYI, you can clone the eMMC, edit the clone, and flash the clone back (for example, with the fstab entry removed).

Currently not near the thing, as I have focused on getting a backup going. Seemingly it was more production-level that I thought :)

  • It was an UUID-entry for ext4, but… it ended with “0 0”
  • yes, no USB or anything
  • I did various things to get things moving, but fstab was the only system-file
  • I’ll check for the diff when it’s in reach again. I did skim for changes, but probably have missed something.

Yes, I’m aware. That’s why I was so irritated it did not get into recovery mode. “E> Failed to get board id info!” reminded me how hard it was to get it installed a year ago, and before - I had to force stuff and manually enter the board-data, as it did not want to recognize it. I’m therefore assuming it’ll be an RMA eventually, as something is probably broken since the beginning.

Incidentally, Jetsons don’t have a BIOS, which is why it is hard to build rescue environments. It is possible that an initrd could be modified to drop into a root shell and make some commands like mount available, but that would be entirely custom. When you have a BIOS you can create mini boots to very limited rescue environments. In the mainline UEFI (which currently requires Orin) it would rather cool if NVIDIA added a rescue shell to boot (UEFI would make this easier so long as UEFI itself survives).

The “0 0” at the end is related to mount order and priority of backup and restore software. The dump order won’t change anything unless you use some special backup software. The next 0 is for fsck order. In theory these will not harm boot, but they could “perhaps” change boot time and change backup (the NVIDIA clone does not care about this).

UEFI has begun in L4T R35.x. This is why you have a VFAT partition: It is the UEFI firmware and not Linux. You don’t want to modify this. I sometimes clone this to multiple drives on my desktop PC so that I can point the BIOS at that boot configuration set even if I lose a disk, but Jetsons don’t have a BIOS, so this won’t help.

I suspect that you named one of these two conditions:

  • A mount for something which cannot be found, and no option to “boot anyway without this device”.
  • A root “/” device in conflict with the actual root device.

As is I cannot think of a way to fix this without clone and flash. This is why I tend to first try mount commands with an option to allow failure of device (or when installing a new kernel a second boot entry as a backup to the original kernel).

Recovery mode itself is quite reliable, and has very few dependencies. No matter what you’ve flashed this shouldn’t stop recovery mode. Usually there is some other problem in configuration (any kind of VM is a major headache and is expected to make flash fail, most often by not finding the Jetson or losing it in the middle of a flash). Failing to get board ID might be something else.

Do you have your flash software present from the same release it was originally flashed? If so, you would have a “~/nvidia/nvidia_sdk/JetPack...version.../Linux_for_Tegra/” directory. If you are not using a VM you can easily use this to clone, and a clone is a good test of whether the Jetson has actually failed. Clones do take a lot of disk space on the host PC (if your rootfs partition is 32 GB, then I would suggest at least 100 GB of spare space on the host PC), and take a long time to complete. If the correct USB connector is connected (USB-C on the AGX on the same side as the 40-pin header), and if you powered up with the recovery button held down (there is no requirement for long hold time, you just hold it down at the moment of power on or power reset, then let go), then this should work from Linux_for_Tegra/:
sudo ./flash.sh -r -k APP -G my_clone.img jetson-agx-xavier-devkit

Success with that would produce my_clone.img.raw and my_clone.img. This would also demonstrate the Jetson still functions. Any use of a VM is expected to fail under most circumstances.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.