Stuck at Nvidia logo after power outage

in_sympathy · December 31, 2022, 4:12pm

Hi there.
I have the following issue: because of war here in Ukraine we have regular power outages. And every time my Jetson Nano shuts down improperly when I switch it back on I get stuck at Nvidia logo. Doesn’t matter what I do - it won’t boot. After about half an hour of either being stuck at Nvidia logo or if I shot it down completely if I reboot or turn it on it will boot.

Important note: I boot from a USB SSD.

If I boot from a card - it’s ok.

It feels like after an improper shutdown it needs like 30 minutes for some cap to discharge or I don’t know.

Any ideas 💡?

linuxdev · December 31, 2022, 8:16pm

A Jetson is very similar to any desktop PC. The Linux ext4 filesystem originally (when it was ext2) did not have a journal. Any time ext2 lost power with unwritten changes, then the filesystem would have to be repaired. It is a tree structure, and depending on what was corrupt, it might end up missing files or even entire directories. Some more extensive failures would be uncorrectable. When a repair (fsck) actually does remove something, and the tool does not know where that “something” belongs, it’ll put it in the “lost+found/” directory of that partition. This at least gave a “repaired” filesystem a way to hint to the administrator what was lost (assuming fsck could figure it out).

Once the filesystem evolved beyond ext2 it gained a journal (and is now ext4). That journal is synchronous, and it is written exactly at the time anything is told to write to it. It is very slow, but it is also very tiny, and so it doesn’t have a lot of impact. The rest of the filesystem is buffered or cached, and takes a moment to actually write once told to write. Once that write is complete the journal entry is removed. When shut down improperly, if the amount of entries being written does not exceed the journal size, then changes are just backed out by reverse order. No damage, no removed nodes, nothing needs to go to “lost+found/”. However, if the amount of changes exceeds the size of the journal, then you are essentially back to the old days of not having a journal. fsck would be used to repair, and this might mean the tool having to guess, and also might end up putting some content in “lost+found/”.

Once any corruption exceeds journal size it is unsafe to perform any kind of write operation. Thus, only in cases where the amount of damage and nature of damage is simple, would the system be able to recover. In more extensive cases the system will demand to only work on that partition in read-only mode. The administrator would have to manually run fsck.ext4 on the partition (in your case, if you have a USB drive and can work on it from another Linux system, this would be the perfect way to do it…otherwise you need a rescue mode capable of running fsck.ext4 while the partition is read-only). There is no guarantee that whatever needs to be trimmed from the filesystem tree will leave the system bootable.

Do you have another Linux host PC system? If so, then:

Monitor “dmesg --follow” from the host PC.
Then insert the USB cable for the drive, and note which device it is (probably something like “/dev/sdb”).
This should not be mounted, and if there are enough errors, then the host PC won’t mount it. This is how you want things during a repair (unmounted).
Note that the device as a whole might have the name “/dev/sdb”. The first partition would be “/dev/sdb1”. You’ll need to know which partition is involved. If the device sdb, then this command would provide some details about its layout:
lsblk -f /dev/sdb
Attempt automatic repair (I assume the name is “/dev/sdb”, and that the ext4 partition is “/dev/sdb1”, adjust for your case to name the ext4 partition):
sudo fsck.ext4 -p /dev/sdb1

If the above worked, then make sure it is not mounted when you unplug it. After a successful repair it is possible it might auto mount. You can use “lsblk -f /dev/sdb” to see (after done) if it mounted and needs “sudo umount /dev/sdb1”.

If damage was sufficient, then you might need to force check. Automatic repair might refuse. This is an indication that damage is more extensive. Any repair (once done) will have hints as to what was removed by examining the “lost+found/” subdirectory at whatever mount point the partition is mounted on.

If you must force check, then you might try this (assuming the “-p” option does not do the job; I am also assuming it is “/dev/sdb1”):
sudo fsck.ext4 -fvy /dev/sdb1

Usually what gets corrupted is the content which was being written. Damage is not necessarily limited to this though since writing to a file in a directory might update the directory entry, and the directory itself might have been caught mid-write during the power loss.

Incidentally, if your Jetson boots to SD card, then it works as a host PC during the above rescue since attaching it after boot, without mounting it, leaves the drive read-only when damaged. This is one case where the Jetson itself could be used as a host PC (though likely the drive would change to name “/dev/sda”…just watch “dmesg --follow” as you plug in the USB drive).

There are ways to backup a corrupt partition via “dd” if something valuable is on it and repair might damage something. One could work on the loopback mounted backup and leave the original drive untouched until you are certain what you want to do. Repair tools don’t know the difference between an actual partition and a loopback file (via dd backup) pretending to be a partition.

Good luck, and just ask if you have more questions.

in_sympathy · January 5, 2023, 12:45pm

Hi @linuxdev
Thanks for a really extensive reply - a lot of useful information for me as I’am only starting my Linux dive (though it’s been quite a few exciting years already :) )

See the thing is that it doesn’t really matter if there are any disk errors after an improper shutdown in my particular case - it’ll take 20-30 minutes of being stuck an Nvidia boot logo before the Nano would reboot into the system correctly after this.
I tried to check and successfully correct errors on another computer - it’ll still only get stuck at Nvidia boot logo regardless if I check for errors or not.
I tried to shut it down or press reset to reboot it - it won’t boot past the Nvidia logo regardless if I check and correct errors or no.
It actually feels like there is some sort of a capacitor that needs to be discharged in order for something to get reset before my Nano would boot correctly after the improper shutdown. After those 20-30 minutes it’ll boot OK in 99% of the cases - again no matter if I check for errors.
1% would be if there were some severe drive errors - then I’ll have to correct them on another PC and it’ll boot OK but only if 20-30 minutes have passed.
I tried disconnecting the power cord and pressing the power button and reset button to maybe discharge capacitors if that’s the case - no luck so far.
That’s why I’am asking - there is something quite weird to it

linuxdev · January 5, 2023, 3:20pm

I’m hoping there is no actual hardware error. I don’t normally use alternate boot for a Jetson (the SSD part). Does it always have that delay, regardless of boot method (assuming you can for example boot to either SSD or either of eMMC or SD)? A repaired filesystem might be missing some of the boot content. The trouble with a repaired filesystem is that you never know which part is missing (technically, you could try to identify any content in “lost+found/”, but that only applies to ext4 in some circumstances and won’t show anything lost during journal recovery).

The capacitor idea might have some validity, so what happens if you unplug the power, hold the power button down for about 10 seconds to discharge, and only then add power and bring up the system? Note that normally holding the power button down for shutdown is a bad idea unless you’ve made sure your system is set to soft shutdown (versus holding the button down causing power to just cut). The carrier board itself has the power delivery components on it; it is rare for a module to fail, but in some cases a power component on the carrier board itself might fail after a power surge.

in_sympathy · January 5, 2023, 7:07pm

See it always has that delay if I boot from SSD and no it doesn’t matter if I disconnect the drive to fix any errors - it will boot anyway but only after 20-30 minutes and only after an improper shutdown.

It will always boot right away from an microSD card though - no delay even after improper shutdown.

As for the button - I tried holding both buttons and nothing worked.

UPDATE: now it won’t boot from USB SSD even after a proper shutdown and would again get stuck at NVIDIA logo (see a photo)

Probably the board for some reason can’t find what to boot from. Pressing reset button didn’t help as always.

UPDATE 2: holding a power button with a cable unplugged did the trick - I did it for about 15-20 seconds instead of 3-5 seconds as I did before. After holding the button for 15-20 seconds I replugged the power cable and it booted as it should, so I guess it’s a hardware related issue - I wonder if it’s just mine or anyone else has this problem with booting from USB?

in_sympathy · January 6, 2023, 12:09am

Forget everything I said. It’s something totally different because I failed to reproduce my success.

Even more - I have discovered that my SD card slot is broken - the locking mechanism is not working anymore because the spring failed somehow.
Since I use the Waveshare case with SD card adapter I didn’t notice it up until now.
What I have discovered is that probably because of that Jetson failed to detect that there is no SD card present and would hesitate to boot from USB as a result.

I tweaked the SD card slot with tweezers a little and now the system does say at boot that SD card is absent l, but still won’t boot from USB.

I’ll reset my Jetson completely with a fresh Jetpack and experiment with all this a little more.

in_sympathy · January 6, 2023, 2:27pm

So from what I see as of now it’s the uSD card slot is broken and malfunctioning. It has a broken spring mechanism that as far as I understood helps the board to detect if the card is present. For now it works because of the uSD card extender I use with my Waveshare enclosure is locked in place by a screw - otherwise the card won’t stay in the slot because of the broken locking mechanism.

I even managed to update Jetpack and migrate the system to SSD and it boots. But! It won’t boot from an SSD without an SD card also being present.

I will order a replacement SD card slot and try to find a repair shop to do the job, because that is way too much for my level of skills, but for now that’s the way it is.

linuxdev · January 6, 2023, 7:29pm

I want to start by noting that the software on the website from NVIDIA is intended for a development kit. A third party carrier board will usually require a modified device tree, and so third parties usually provide their own board support package (which is usually the same as the dev kit, but with minor device tree edits). If the Waveshare carrier board is an electrical exact replica of the layout of the dev kit carrier board (which does happen sometimes, but it isn’t the “average” case), then it won’t need any modification; whatever parts are modified though might not work with the dev kit software.

The need for the SD card means this is where the initial flash has told the software to point to. This is normal. It likely means the “/boot” is coming from the SD card and that this chain loads or else names a rootfs on the other device by some means (meaning either by scripted detection or manually naming that device). Quite often one will flash on command line to name a different rootfs. However, it gets confusing as to what actually happens when you do name an alternate rootfs. It might be that the pointer to where to start boot still goes to SD card, and that the extlinux.conf of the SD card is then edited to point at the SSD for rootfs, or it might mean the pointer starts with the extlinux.conf of the other device. I don’t know for sure what your case is.

Earlier mention of lag during boot sounds like a software timeout. During boot most devices (not just Jetsons) will detect bootable devices, perhaps searching for them in a particular order. That device might take time to fail detection before moving on to the next device. If that is the case, then boot device search order could be modified to speed things up. Or it might be the extlinux.conf has to be changed (I don’t know for your case what the exact delay problem is). The 2 to 3 minute delay really sounds like software timeout.

I am assuming at this point that flash is looking first at the SD card’s “/boot/extlinux/extlinux.conf” (along with device tree, but that is mostly irrelevant at this point) before proceeding (once we get past early boot stages…it is the early boot stage that looks for extlinux.conf). Can you post a copy of the extlinux.conf from both the SD card and the SSD (make sure to label so we know which is which)? Also, since a UUID or PUID might be involved, then with the SSD connected (it doesn’t have to boot from it), you might include the output of “lsblk -f” or “blkid”.

So far as the mechanical part goes with the SD card latch, I hate to say it, but you probably need to RMA it (the module is likely good, but I don’t know anything about Waveshare’s policies). For the case of an NVIDIA dev kit you could just RMA this. I realize your situation might make this difficult though, so whatever information you can post go ahead and add (e.g., the exact model of carrier board and module). Perhaps something can be done to help.

in_sympathy · January 9, 2023, 8:42pm

My case is exactly this - SD card tells the system to look for a /boot on a USB SSD. Is there any way to tell the system to search for a boot on the SSD first so I could get rid of the SD card once and for all?

As for the extlinux.conf - here goes:

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=PARTUUID=62ab8ab4-b62f-4f3f-87bf-f4ba7a72e75a rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 nv-auto-config 

#LABEL sdcard
#      MENU LABEL primary kernel
#      LINUX /boot/Image
#      INITRD /boot/initrd
#      APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 #console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0

# When testing a custom kernel, it is recommended that you create a backup of
# the original kernel and add a new entry to this file so that the device can
# fallback to the original kernel. To do this:
#
# 1, Make a backup of the original kernel
#      sudo cp /boot/Image /boot/Image.backup
#
# 2, Copy your custom kernel into /boot/Image
#
# 3, Uncomment below menu setting lines for the original kernel
#
# 4, Reboot

# LABEL backup
#    MENU LABEL backup kernel
#    LINUX /boot/Image.backup
#    INITRD /boot/initrd
#    APPEND ${cbootargs}

And as for the additional Waveshare board - it’s just an extender,nothing too fancy here - I doubt it’s the root of the problem - #5 in the picture below:

linuxdev · January 9, 2023, 10:23pm

Which extlinux.conf is that? Is it the one on SD card? From what you’ve said this is likely the one which matters.

When booted to SD card (or actually any boot), the root filesystem is chosen (in this case) via partition UUID. What do you see (regardless of how booted, but with the SSD attached) from:

lsblk -f
blkid

However, so far as I know, it’ll always first point to an SD card in order to transfer to the SSD. Someone else may need to verify this (I’ve not experimented with SSD boot), but I do not think there is any way to boot from SSD without an SD card installed on the dev kit Nano due to boot software.

in_sympathy · January 10, 2023, 7:47pm

I have the same extlinux.conf file on both since I set up the system on SD card and then cloned it to SSD.

Here’s lsblk:

NAME      FSTYPE  LABEL      UUID                                 MOUNTPOINT
loop0     squashf                                                 /snap/jami/186
loop1     squashf                                                 /snap/snapd/17
loop2     squashf                                                 /snap/snapd/17
loop3     squashf                                                 /snap/core18/2
loop4     squashf                                                 /snap/fragment
loop5     squashf                                                 /snap/fragment
loop6     squashf                                                 /snap/gnome-3-
loop7     squashf                                                 /snap/core22/4
loop8     squashf                                                 /snap/bare/5
loop9     vfat    L4T-README 1234-ABCD                            
sda                                                               
└─sda1    ext4    APP        f1e08721-813e-401e-8c8d-4ea42e91517f /
mtdblock0                                                         
mmcblk0                                                           
└─mmcblk0p1
          ext4               cfef8f6b-16af-4c30-8bf8-af8f992e1c25 /media/in_symp
zram0                                                             [SWAP]
zram1                                                             [SWAP]
zram2                                                             [SWAP]
zram3

And here’s blkid:

/dev/loop0: TYPE="squashfs"
/dev/loop1: TYPE="squashfs"
/dev/loop2: TYPE="squashfs"
/dev/loop3: TYPE="squashfs"
/dev/loop4: TYPE="squashfs"
/dev/loop5: TYPE="squashfs"
/dev/loop6: TYPE="squashfs"
/dev/loop7: TYPE="squashfs"
/dev/mmcblk0: PTUUID="6eabb8f5-31f5-45ac-83c0-c146ac9e9c25" PTTYPE="gpt"
/dev/mmcblk0p1: UUID="cfef8f6b-16af-4c30-8bf8-af8f992e1c25" TYPE="ext4" PARTLABEL="APP" PARTUUID="33479907-5377-4ed3-ab99-820dcab54a48"
/dev/sda1: LABEL="APP" UUID="f1e08721-813e-401e-8c8d-4ea42e91517f" TYPE="ext4" PARTUUID="62ab8ab4-b62f-4f3f-87bf-f4ba7a72e75a"
/dev/loop8: TYPE="squashfs"
/dev/loop9: SEC_TYPE="msdos" LABEL="L4T-README" UUID="1234-ABCD" TYPE="vfat"
/dev/zram0: UUID="0f42d34c-85f7-43d7-a96d-fca9d7c7ab71" TYPE="swap"
/dev/zram1: UUID="8dc26bea-6ba3-433c-8b88-077774930cc0" TYPE="swap"
/dev/zram2: UUID="b5d55cd7-3502-49bd-ac62-cee57413ff28" TYPE="swap"
/dev/zram3: UUID="a2d483c4-8a16-4b6e-9e9b-e018a79575dc" TYPE="swap"

linuxdev · January 10, 2023, 9:17pm

I don’t have an extra drive to work with alternate boot media, but it looks like it should get past the logo. You do have a UUID 62ab8ab4-b62f-4f3f-87bf-f4ba7a72e75a, although I’m wondering if this UUID might work instead:
f1e08721-813e-401e-8c8d-4ea42e91517f
(note that there is both a “PARTUUID” and “UUID”; I can’t test to see which is preferred by the software since I don’t have a drive to test with)

Before modifying any extlinux.conf, do you have serial console working? Probably you do, since I see a commented out SD card entry. This makes working on partitions and boot much safer. One can interrupt boot at the right place using serial console, and pick from multiple boot entries. You have one entry I’d like to leave alone since you can get into SD card, but you could add a second entry with modifications and safely test that way. Here is a suggested edit to the extlinux.conf, but if would require serial console to pick the second entry:

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=PARTUUID=62ab8ab4-b62f-4f3f-87bf-f4ba7a72e75a rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 nv-auto-config 

LABEL altuid
      MENU LABEL alt uid
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=PARTUUID=f1e08721-813e-401e-8c8d-4ea42e91517f rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 nv-auto-config 

LABEL sda1
      MENU LABEL sda1
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=/dev/sda1 rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 nv-auto-config 

#LABEL sdcard
#      MENU LABEL primary kernel
#      LINUX /boot/Image
#      INITRD /boot/initrd
#      APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 #console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0

# When testing a custom kernel, it is recommended that you create a backup of
# the original kernel and add a new entry to this file so that the device can
# fallback to the original kernel. To do this:
#
# 1, Make a backup of the original kernel
#      sudo cp /boot/Image /boot/Image.backup
#
# 2, Copy your custom kernel into /boot/Image
#
# 3, Uncomment below menu setting lines for the original kernel
#
# 4, Reboot

# LABEL backup
#    MENU LABEL backup kernel
#    LINUX /boot/Image.backup
#    INITRD /boot/initrd
#    APPEND ${cbootargs}

This would allow you to pick via the alternate UID, or via “sda1”. If one of those works it would be a safety. You could then pick the main entry, and if you end up editing boot order, then one of those would be what you pick.

Note: Any content in the QSPI would require flashing, but you can edit extlinux.conf first to test things.

system · January 24, 2023, 9:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Xavier Nx Not Booting Jetson Xavier NX boot	56	13976	October 18, 2021
SD card damage protection Jetson Nano security	119	5749	October 15, 2021
Jetson Nano production module does not boot on custom carrier board, but does so on auvidea's Jetson Nano boot , board-design	43	4386	November 24, 2021
Dist upgrade issue on Jetson nano Jetson Nano reflash	26	1714	April 30, 2023
Jetson Nano crashes after 3 to 10 days of operations Jetson Nano reboot	19	2038	October 29, 2022
Jetson Nano production module takes long time to boot when SD card is inserted Jetson Nano	43	5081	October 15, 2021
Move OS to USB SSD Jetson Nano boot	45	3348	September 12, 2021
Tuning linux on jetson nano for better data reliability in power failure scenario Jetson Nano kernel	14	1584	June 7, 2023
Anyone got the Geekworm Jetson Nano NVMe M.2 SSD Shield T100 working reliably? Jetson Nano	24	2791	October 14, 2021
Boot Jetson TX2 from SD Card - require to know config changes Jetson TX2	24	13941	October 18, 2021

Stuck at Nvidia logo after power outage

Related topics