Orin NX unable to boot after days in fully functional operation

Yes, if it is a step that makes the kernel not compatible, then it should happen right after the next reboot.

If you could still boot into the error situation, maybe I could share some steps and we can check what is wrong on your side. But need sometime before I shared out what to check.

Hi WayneWWW

parg is on Holiday, so i’m taking over. What do you think are the next steps to find the error. We can still boot into the error situation.

Any updates on how to find the problem?

Sorry, missing this one. will update.

Hi,

Just want to double check again. Are you still able to operate the console of the board in this situation?

No, the log says to press enter to enter the bash console, but this does not work.

However, we were able to determine why it no longer boots. Somehow the initrd image file was changed. We have decompiled a working initrd and one that is affected by the problem. We noticed that many files are missing. For example in the lib/modules folder:

Working decompiled initrd image:

waep@waep-p360:~/Documents/l4t_initrd_decomp/initrd_working/lib/modules/5.15.148-tegra$ ls -l
total 2004
drwxrwxr-x 3 waep waep   4096 Mär 19 10:08 kernel
-rw-rw-r-- 1 waep waep 399006 Mär 19 10:08 modules.alias
-rw-rw-r-- 1 waep waep 386429 Mär 19 10:08 modules.alias.bin
-rw-rw-r-- 1 waep waep  38561 Mär 19 10:08 modules.builtin
-rw-rw-r-- 1 waep waep  74760 Mär 19 10:08 modules.builtin.alias.bin
-rw-rw-r-- 1 waep waep  43255 Mär 19 10:08 modules.builtin.bin
-rw-rw-r-- 1 waep waep 222974 Mär 19 10:08 modules.builtin.modinfo
-rw-rw-r-- 1 waep waep 111853 Mär 19 10:08 modules.dep
-rw-rw-r-- 1 waep waep 159888 Mär 19 10:08 modules.dep.bin
-rw-rw-r-- 1 waep waep    134 Mär 19 10:08 modules.devname
-rw-rw-r-- 1 waep waep  38949 Mär 19 10:08 modules.order
-rw-rw-r-- 1 waep waep    925 Mär 19 10:08 modules.softdep
-rw-rw-r-- 1 waep waep 243917 Mär 19 10:08 modules.symbols
-rw-rw-r-- 1 waep waep 287979 Mär 19 10:08 modules.symbols.bin
drwxrwxr-x 3 waep waep   4096 Mär 19 10:08 updates

Failing decompiled initrd image:

waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra$ ls -l
total 4
drwxrwxr-x 3 waep waep 4096 Mär 19 10:27 kernel
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra$ cd kernel/
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel$ ls -l
total 4
drwxrwxr-x 3 waep waep 4096 Mär 19 10:27 drivers
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel$ cd drivers/
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers$ ls -l
total 4
drwxrwxr-x 3 waep waep 4096 Mär 19 10:27 net
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers$ cd net/
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net$ ls -l
total 4
drwxrwxr-x 3 waep waep 4096 Mär 19 10:27 ethernet
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net$ cd ethernet/
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net/ethernet$ ls -l
total 4
drwxrwxr-x 4 waep waep 4096 Mär 19 10:27 realtek
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net/ethernet$ cd realtek/
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net/ethernet/realtek$ ls -l
total 8
drwxrwxr-x 2 waep waep 4096 Mär 19 10:27 r8126
drwxrwxr-x 2 waep waep 4096 Mär 19 10:27 r8168
waep@waep-p360:~/Documents/l4t_initrd_decomp/failing/lib/modules/5.15.148-tegra/kernel/drivers/net/ethernet/realtek$ 

As you can see, there is almost nothing in the lib/modules folder of the decompiled failing initrd image, so there is no PCI driver either. However, we do not know when the change of the initrd happens.

Hi,

I can only say you should try to repeat what you’ve done on your side.

It is unlikely to know about that from myside.

Hi

Alright i understand that.

Just to make something clear:

If we are trapped in this boat problem, this is how we get out of it:

  1. power up jetson and go to UEFI Settings
  2. in UEFI Settings we are going to DeviceManager->NVIDIA Configuration->L4T Configuration and changing L4T Boot Mode to Kernel Partition
  3. system boots up and we replace the initrd image that is not working with a working one.
  4. then reboot and change the changed setting in the UEFI settings back to extlinux
  5. reboot again and the system runs again as before the incident

What I don’t understand is how the new initrd gets into the QSPI. As I understand it, the initrd is loaded from the QSPI because the nvme cannot yet be accessed at this stage of the boot process. How does the initrd copied to the boot directory get to the QSPI? Sorry if that is a dumb question.

Hi,

The initrd is loaded from /boot/initrd during boot process. And that one is on your boot disk. Not QSPI.

Hi,

By boot disk do you mean the nvme disk in our case? But how is the initrd loaded from the nvme if the pcie driver can only be loaded after the initrd has been loaded and without pcie driver no access to the nvme can take place? Again, sorry if this is a stupid question.

That is why UEFI has tegra pcie driver supported.