Orinnx reboot repeatly but it switch to B unexpectly

smileandcry2023 · February 2, 2026, 3:13am

  What's the possible reason for that

sebastien.schertenleib · February 2, 2026, 7:17am

Hi,

Yes we have reproduced the issue, in particular, during the week-end we have witnessed another crash using a vanilla Nvidia BSP 35.6.2 and a nano devkit. With 0% code from us. As it seems Nvidia support does not have access to the hardware, we are going to dispatch a fully repro to one of their facility, hardware and software stack so they can witness and hopefully investigate this matter.

Usually when it crashes, it end up with an unresolved exception in EL3. We are working into a workaround, far from being ideal but at least to mitigate the situation.

sebastien.schertenleib · February 4, 2026, 2:35pm

Hi,

Now on two distinct nano devkit, orin nx, ssd, we have witness several times this problem, so this seems confine within some of the module. Can you not make a request to retrieve some PCN-modules?

For me I don’t understand how you can provide valuable support if you do not even have the required hardware.

Also can you provide a way to interpret the revision of the module to see if we can extract some pattern.

Thanks

smileandcry2023 · February 5, 2026, 1:24am

Hi

Is there still an engineer tracking this issue？

We have spent a month to test

DaneLLL · February 5, 2026, 2:29am

Hi,
The issue is specific to certain modules. Please collect the modules and apply for RMA:
Jetson FAQ | NVIDIA Developer

Or contact local distributor to swap the modules.

sebastien.schertenleib · February 5, 2026, 7:14am

So you confirm that some modules are faulty and need to be exchange? Is it possible to have SKUs’s ID that might be affected? As this is time consuming to reproduce this issue at production level and is also an expensive venture.

Thanks

DaneLLL · February 5, 2026, 7:19am

Hi,
We are checking it internally and has not concluded it is a HW or SW issue. The repro rate is low so we would need some time to do investigation. It looks to be an issue occurring on specific modules, so if you can collect the modules, please apply for RMA process.

sebastien.schertenleib · February 5, 2026, 7:58am

Hi,

We have more than an hundred modules that are affected and possibly more that went undetected. So we really need to understand if this problem is confined to some revisions or not. So that we can quarantine them by reading the EEPROM to save valuable time. We are in the process to collect skus for a batch of 20 faulty orin nx module.

From our experiments, this seems to be connected to external abort. Can you share what current finding internal has found so far?

Thanks

sebastien.schertenleib · February 5, 2026, 11:23am

Hi,

Here is a partial list of defect orin nx module:

**Serial Numbers – Orin NX**

 1. SN **1423225044968** | 3C6D66F30DF6 | 699-13767-0000-303 | 161-0546-10X

 2. SN **1423225045191** | 3C6D66F30CB6 | 699-13767-0000-303 | 161-0546-10X

 3. SN **1423225024519** | 3C6D66F326D1 | 699-13767-0000-303 | 161-0546-10X

 4. SN **1423225024307** | 3C6D66F3258B | 699-13767-0000-303 | 161-0546-10X

 5. SN **1422425080696** | 3C6D66B27068 | 699-13767-0000-303 | 161-0546-10X

 6. SN **1423225023868** | 3C6D66F3257D | 699-13767-0000-303 | 161-0546-10X

 7. SN **1423225023950** | 3C6D66F3256F | 699-13767-0000-303 | 161-0546-10X

 8. SN **1423225024651** | 3C6D66F326C7 | 699-13767-0000-303 | 161-0546-10X

 9. SN **1423225044482** | 3C6D66F30C6A | 699-13767-0000-303 | 161-0546-10X

10. SN **1423225024509** | 3C6D66F326A7 | 699-13767-0000-303 | 161-0546-10X

11. SN **1423225023467** | 3C6D66F32534 | 699-13767-0000-303 | 161-0546-10X

12. SN **1422525001084** | 3C6D66B275D9 | 699-13767-0000-303 | 161-0546-10X

13. SN **1423125062779** | 3C6D66F32551 | 699-13767-0000-303 | 161-0546-10X

14. SN **1422425064278** | 3C6D66B270BA | 699-13767-0000-303 | 161-0546-10X

15. SN **1423225045161** | 3C6D66F30CA4 | 699-13767-0000-303 | 161-0546-10X

16. SN **1423225024049** | 3C6D66F32596 | 699-13767-0000-303 | 161-0546-10X

17. SN **1423125062784** | 3C6D66F3255A | 699-13767-0000-303 | 161-0546-10X

18. SN **1423225024075** | 3C6D66F3257E | 699-13767-0000-303 | 161-0546-10X

19. SN **1423225044853** | 3C6D66F30C88 | 699-13767-0000-303 | 161-0546-10X

20. SN **1423225023951** | 3C6D66F32553 | 699-13767-0000-303 | 161-0546-10X

21. SN **1422425062564** | 3C6D66B2706B | 699-13767-0000-303 | 161-0546-10X

22. SN **1422525000951** | 3C6D66B27116 | 699-13767-0000-303 | 161-0546-10X

23. SN **1422425080695** | 3C6D66B26ECE | 699-13767-0000-303 | 161-0546-10X

24. SN **1423225024094** | 3C6D66F3257C | 699-13767-0000-303 | 161-0546-10X

25. SN **1422425064681** | 3C6D66B27759 | 699-13767-0000-303 | 161-0546-10X

26. SN **1421725089318** | 3C6D6661438F | 699-13767-0000-301 | 161-0546-10X

27. SN **1422525002558** | 3C6D66B28FE6 | 699-13767-0000-303 | 161-0546-10X

28. SN **1423525028588** | 4CBB4718A877 | 699-13767-0000-303 | 161-0546-10X

29. SN **1423525027776** | 4CBB4718A432 | 699-13767-0000-303 | 161-0546-10X

30. SN **1423525028587** | 4CBB4718A75F | 699-13767-0000-303 | 161-0546-10X

31. SN **1422525002418** | 3C6D66B28E72 | 699-13767-0000-303 | 161-0546-10X

32. SN **1422525002417** | 3C6D66B28E81 | 699-13767-0000-303 | 161-0546-10X

33. SN **1421725089604** | 3C6D666145AD | 699-13767-0000-301 | 161-0546-10X

34. SN **1422425065960** | 3C6D66B29618 | 699-13767-0000-303 | 161-0546-10X

35. SN **1422425080102** | 3C6D66B28FDD | 699-13767-0000-303 | 161-0546-10X

36. SN **1422525002495** | 3C6D66B28FD8 | 699-13767-0000-303 | 161-0546-10X

37. SN **1423525030807** | 4CBB47189E06 | 699-13767-0000-303 | 161-0546-10X

38. SN **1422525003950** | 3C6D66B27545 | 699-13767-0000-303 | 161-0546-10X

39. SN **1423525029458** | 4CBB4718A49E | 699-13767-0000-303 | 161-0546-10X

40. SN **1422525004176** | 3C6D66B2752A | 699-13767-0000-303 | 161-0546-10X

41. SN **1423525029101** | 4CBB4718A4D5 | 699-13767-0000-303 | 161-0546-10X

42. SN **1423525028836** | 4CBB4718A4D8 | 699-13767-0000-303 | 161-0546-10X

43. SN **1422525004173** | 3C6D66B27543 | 699-13767-0000-303 | 161-0546-10X

44. SN **1423525027795** | 4CBB4718A4AD | 699-13767-0000-303 | 161-0546-10X

45. SN **1422525002892** | 3C6D66B28DC7 | 699-13767-0000-303 | 161-0546-10X

46. SN **1423525027802** | 4CBB4718A42B | 699-13767-0000-303 | 161-0546-10X

47. SN **1423525029241** | 4CBB4718A4F4 | 699-13767-0000-303 | 161-0546-10X

48. SN **1423525029942** | 4CBB47189C49 | 699-13767-0000-303 | 161-0546-10X

DaneLLL · February 6, 2026, 12:52am

Hi,
We have customer reporting PCIe C4 failing to detect NVMe SSD in booting, triggering system hangs in booting. It is specific to certain modules randomly and not specific to certain serial number. It is not expected the issue is present on so many modules. So if you put the modules on developer kit and flash r35.6.2, it cannot boot up successfully? There is failure rate or it fails to boot every time?

sebastien.schertenleib · February 6, 2026, 7:18am

Hi,

We have seen as well to be connected with the PCIe/NVME driver setup but also that sometimes the memory bus trigger ECC before the MMU can handle them, and then a hardware interrupt is trigger leading to unhandled exception in EL3 within ATF.

We are trying to intercept those situations upfront and write into the scratch register for the boot slot so it reboot in the same slot then force a reboot. Even so we are able to detect (hopefully) the cases early upfront, the root cause remain a mystery. Obviously, we do not have access to all the intrinsic so you may have a better chance to tackle it.

We are going in the next few days/weeks stress test it to see if this is improved with this WAR.

It boots most of the time but depending of the module this can be within 60-6000 reboot with avg to 100-150. And we have tested this using our own custom board and BSP, but also using Xavier and nano devkit using stock sample BSP as provided by Nvidia, so there is probably some timing issue near the limit that trigger this issue or other factors, but on our side the module is mostly a black box.

smileandcry2023 · February 10, 2026, 6:37am

log.txt (9.4 KB)

hi

we tested another machine, the system OS boot up failed, and it crached twice in UEFI, please help us to check , thank you

DaneLLL · February 11, 2026, 6:17am

Hi @smileandcry2023
The assertion looks different from the PCIe detection failure:

ASSERT [PrePi]  edk2-docker/nvidia-uefi/edk2-nvidia/Silicon/NVIDIA/PrePi/PrePi.c (507)

Do you observe it on custom board or developer kit? Do you use r35.6.2 or r36.5? Does it occur in each boot or there is failure rate?

smileandcry2023 · February 12, 2026, 1:30am

No, just one of our comstomize board reported it.

sebastien.schertenleib · February 17, 2026, 11:41am

Hi,

After contacting our Nvidia representative, we are dispatching a full repro, that is a nano devkit with a SSD and a faulty orin nx using vanilla sample BSP 35.6.2 as provided by Nvidia.

The full repro also automate the reboot at the right time and logs events to highlight when the issue occurs. We also made a document to explain how to interpret the results as well as how to reflash the full repro in case, there is a need to validate on other modules.

Hopefully, it will be possible with it to find the root causes and counter-measures.

smileandcry2023 · February 25, 2026, 5:46am

Hi

I have tested the patch file “overlay_mb1bct_35.x.tbz2” and the description of it is

“This overlay fixes a boot issue caused by the QSPI read timing not having sufficient margin to cover process, voltage, and temperature variations.

“

the rate of the issue decrease too much

the content of the patch is

/ {
device {
qspiflash@0 {
trimmer2-val = <0x04>;
};
};
};

what’s the mean of this?

can we chage the value?

DaneLLL · February 25, 2026, 10:04am

Hi @smileandcry2023
The overlay fixes the issue:
Jetson Orin Nano boot failure with temperature dependency
核心板无法启动

You may try 0x2 to see if stability improves further.

smileandcry2023 · February 27, 2026, 11:20am

I changed the value as 0x02, but “ Orin nano UEFI开机屏幕显示L4TLauncher: Attempting Direct Boot无法关闭 “ this issuse happend more frequently

DaneLLL · March 4, 2026, 5:09am

Hi @smileandcry2023
Please share how to replicate the issue on developer kit:
Orin nano UEFI开机屏幕显示L4TLauncher: Attempting Direct Boot无法关闭 - #43 by DaneLLL

And let’s continue discussion in the topic thread.

sebastien.schertenleib · March 4, 2026, 6:59am

Hi Dane,

What about the full repro we have sent to Nvidia Taiwan that include the nano devkit, a problematic orin nx module and detailed instructions to reproduce it?

What about the investigation on the PCIe C4 failing to detect NVMe SSD in booting, triggering system hangs in booting as reported by another customer?

Any result can be shared?

Thanks

Topic		Replies	Views
Orin NX can't detect NVMe storage (SD Express PCIe) Jetson Orin NX boot , board-design	46	1470	April 15, 2025
Orin NX unable to boot after days in fully functional operation Jetson Orin NX boot	30	573	March 20, 2025
Orin NX pcie work abnormal with nvme SSD Jetson Orin NX pcie , board-design	15	346	August 27, 2025
[JP6.2] Orin nx cannot start normally Jetson Orin NX boot	13	367	September 28, 2025
During the power-off and restart test of the Orin NX, the device failed to boot up Jetson Orin NX boot , board-design	15	403	April 15, 2025
Jetson Orin NX not mounting issue while boot Jetson Orin NX ubuntu	15	471	June 3, 2025
During the pre-production phase of the Orin NX 16GB boot failure issue occurred Jetson Orin NX boot	5	176	October 9, 2025
Assert on NVME PCIE Boot Jetson Orin NX boot , board-design , nvme	16	2125	August 5, 2023
Nvidia Orin AGX 64 GB not booting anymore Jetson AGX Orin boot , nvbugs	13	744	November 19, 2024
Boot failed Jetson Orin NX boot	24	1219	February 27, 2024

Orinnx reboot repeatly but it switch to B unexpectly

Related topics