Upgraded 0rin NX from 35.3.1 to 35.5 OTA, now won’t boot without monitor…

…this is with Waveshare’s Orin Nano Dev Kit board.

Oddly, it won’t even boot with a DP EDID emulator.

The same device booted with the same DP emulator under 35.3.1.

If I plug a regular monitor in it boots fine. I can then swap in the EDID emulator and it continues to work correctly until the next reboot and then it gets stuck in the loop again.

Perhaps something has changed at boot (boot screen resolution maybe?) that isn’t compatible with the DP emulator?

Any clues anyone, without going out to get a serial cable and going down that road? Is there a log I can check after a successful boot with the monitor that would help?

By the way, when I say won’t boot, I mean it gets stuck in an endless cycle of boot attempts and restarts. At no time does it respond to a ping in the network, until the proper monitor is plugged in and it finally boots correctly. The same is true if the DP emulator is plugged in.

Hi oviano,

It seems you are using the custom carrier board from Waveshare rather than the devkit from us.

Please get the custom BSP package of R35.5.0 from your vendor to perform image-based OTA update from R35.3.1 to R35.5.0 or just flash with their custom BSP package.

For the boot failed issue, we would need to check the serial console log for details.

There is no custom BSP it just flashes with the NVidia files.

The OTA apt based update worked except for the monitor issue, which is odd. Since then I reflashed and the issue went away, but I have another identical one which I need to update remotely so I’m unsure how to proceed.

You are using the custom carrier board so that you have to use custom BSP package rather than official NVIDIA BSP package.

For OTA, please perform image-based OTA instead of debian based OTA update.

Ok thank you! Problem is there is no custom BSP package, as mentioned.

So basically the Debian-based OTA method is only applicable to official NVIDIA dev kits, is that the case?

I’ll try and figure out how to do an image-based OTA - does that allow for retaining settings or is it more like a re-flash?

It’s so close - OTA Debian-based for this board actually works perfectly, except for the monitor issue.

Please ask for the custom BSP package from your vendor.
They should release the BSP package with the customized changes for their board.

Correct, there might be difference between custom carrier board and devkit board from NVIDIA.

Custom carrier board might have the similar design as the devkit. But there might be also the difference like in display so that you get monitor issue after update.

Right, so I think I need to make sure I use an official Nvidia dev board - trouble is, they’re hard to find in the UK and always seem to come with a Nano SOM when I just want to buy the board by itself or with an Orin NX.

That’s why I ended up buying the Waveshare.

I’m not sure if they have R35.5.0 release for their custom carrier board.
Or you can check if they have custom BSP package release for R35.4.1 maybe on their website to verify image-based OTA from R35.3.1.

For our official devkit, Orin NX and Orin Nano could work with the Orin Nano devkit board(p3768).

Many thanks. Yes, I know about the Orin Nano devkit board (p3768), but as I say it only seems to be available with the Nano SOM (which makes it a lot more expensive than the Waveshare one) and also always on backorder here in the UK.

I’ve also reached out to Waveshare to find it they have any solution, as it seems very close to working. Maybe there is something they can do. I’ll post back if I get any news.

1 Like

Here’s my serial console log.

Seems like it’s an issue that occurs with other combinations of carrier/module…reflashing solves it, but might there be a way of patching in some way an existing installation to stop the crash? It’s just I have an identical device at my brother’s house in another country which I’d like to update to 35.5 remotely (i.e. without re-flashing or figuring out image-based OTA).

Turns out that the system does in fact boot and remain stable if I remove the EDID emulator and only plug it in after the device has booted. Then it all works as normal.

serial_console_log_edid.txt (92.6 KB)

These were my original flash commands for 35.3.1:

tar xf Jetson_Linux_R35.3.1_aarch64.tbz2
sudo tar xpf Tegra_Linux_Sample-Root-Filesystem_R35.3.1_aarch64.tbz2 -C Linux_for_Tegra/rootfs/
cd Linux_for_Tegra/
sudo ./apply_binaries.sh
sudo ./tools/l4t_flash_prerequisites.sh
sudo tools/l4t_create_default_user.sh -u <username> -p <password> -a -n <hostname> --accept-license
sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" --showlogs --network usb0 jetson-orin-nano-devkit internal

…then to update via apt I did…

echo -e "Fix\n1\n\n" | sudo parted ---pretend-input-tty /dev/nvme0n1 print >/dev/null 2>&1 command
sudo nano /etc/apt/sources.list.d/nvidia-l4t-apt-source.list (change 35.3 -> 35.5)
sudo apt update
sudo apt dist-upgrade

Do you mean that you flash our official R35.5.0 BSP package on the custom carrier board and it works as expected?

Please always using image-based OTA for the custom carrier board to upgrade it.

Yes it works perfectly when I re-flash using the official NVIDIA 35.5.0 package.

It also works perfectly after the OTA apt-based upgrade except for when a monitor is attached. It will boot without a monitor attached and then works when you plug one in after booting. It just gets stuck in a boot loop when trying to boot with the monitor/EDID emulator already plugged in and repeats the contents of the above serial console log over and over again.

It does not have a custom BSP, Waveshare’s instructions for flashing it just use the official NVIDIA files.

If I had an “official” dev kit, I’d test to see if the same issue occurs with the upgrade from 35.3.1 → 35.5 but I presume you guys would have already verified that.

Yes, we would suggest you to reproduce if there’s the same issue for the official NVIDIA devkit.

Well that’s unlikely to happen as I don’t have one and I’m not minded to spend £500 to find out.

For your use case, could you just using flash command to update your board to R35.5.0 if it all works as expected?

Thanks - I can and have for the devices I have physical access to, but two are placed in unattended locations for long periods of time so I was hoping I could get the apt update approach to work for those in the future. I’m aware of the image based method but it seems very complicated when I just need to update these two devices from time to time remotely.

I don’t suppose you could verify with the official dev kit whether the update works from 35.3.1 to 35.5?

It seems odd that a full flash of the official 35.5 works and only the upgrade method has this one small issue.

Could you share the full serial console log when you are booting failed from R35.5.0?

That’s what is attached in my post above (Mar 24) ?

It shows the full log and the kernel panic at the end.

if there is something I’ve not done with the log, let me know.

I think the reason why it boots with a real monitor attached but not with the EDID emulator is likely because the real monitor doesn’t handshake until later on or something. So it is booting as if there is no monitor attached (which succeeds) and then the handshake occurs after it has booted so it works. Probably the monitor goes in standby when there is no signal and takes longer to wake up. With the EDID emulator I suspect the device sees the monitor connected right from the start, and that is why it fails.

[   17.746363] CPU:0, Error: dce-fabric@0xde00000, irq=28
[   17.751671] **************************************
[   17.756598] CPU:0, Error:dce-fabric, Errmon:2
[   17.761084]    Multiple type of errors reported
[   17.765745]    Error Code            : FIREWALL_ERR
[   17.769865]    Error Code            : TIMEOUT_ERR
[   17.773895]    Overflow              : Multiple FIREWALL_ERR
[   17.778649] 
[   17.780176]    Error Code            : TIMEOUT_ERR
[   17.784211]    MASTER_ID             : DCE
[   17.787442]    Address               : 0x1380c01c
[   17.791120]    Cache                 : 0x1 -- Bufferable 
[   17.795423]    Protection            : 0x3 -- Privileged, Non-Secure, Data Access
[   17.802217]    Access_Type           : Read
[   17.805708]    Access_ID             : 0x0
[   17.805710]    Fabric                : dce-fabric
[   17.812514]    Slave_Id              : 0x37
[   17.815739]    Burst_length          : 0x0
[   17.819236]    Burst_type            : 0x1
[   17.822556]    Beat_size             : 0x2
[   17.825776]    VQC                   : 0x0
[   17.828557]    GRPSEC                : 0x3f
[   17.831608]    FALCONSEC             : 0x0
[   17.834839] Unable to handle kernel paging request at virtual address 0000000000081000
[   17.842982] Mem abort info:
[   17.845851]   ESR = 0x96000004
[   17.848986]   EC = 0x25: DABT (current EL), IL = 32 bits
[   17.854447]   SET = 0, FnV = 0
[   17.857582]   EA = 0, S1PTW = 0
[   17.860803] Data abort info:
[   17.863756]   ISV = 0, ISS = 0x00000004
[   17.867689]   CM = 0, WnR = 0
[   17.870741] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010c561000
[   17.877370] [0000000000081000] pgd=0000000000000000, p4d=0000000000000000
[   17.884349] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   17.890087] Modules linked in: nvidia_modeset(O) lzo_rle lzo_compress zram ramoops reed_solomon loop bnep snd_soc_tegra186_asrc snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra186_dspk snd_soc]
[   17.977177] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.10.192-tegra #1
[   17.985382] Hardware name: NVIDIA Orin NX Developer Kit (DT)
[   17.991186] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
[   17.997365] pc : __pi_strlen+0x10/0x84
[   18.001209] lr : strstr+0x30/0x90
[   18.004605] sp : ffff800010003cc0
[   18.008006] x29: ffff800010003cc0 x28: ffff5b6b00f0c980 
[   18.013461] x27: ffffd8194a7c2ab0 x26: 0000000000081000 
[   18.018910] x25: 0000000000000370 x24: ffff80005b6ddf70 
[   18.024357] x23: ffffd8194abddbd8 x22: 0000000000000001 
[   18.029810] x21: ffffd8194abddd08 x20: 0000000000000007 
[   18.035263] x19: 0000000000081000 x18: 0000000000000010 
[   18.040709] x17: 0000000000000000 x16: ffffd819494b50f0 
[   18.046152] x15: ffffd8194b1c2bf0 x14: ffffffffffffffff 
[   18.051605] x13: ffff800090003917 x12: 0000000000000038 
[   18.057063] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 
[   18.062508] x9 : ff414f4031485740 x8 : ffffffffffffffff 
[   18.067958] x7 : 0000000000000000 x6 : 0000000000000080 
[   18.073414] x5 : 0000000000000000 x4 : ffffffffffffffff 
[   18.078859] x3 : 0042504132495841 x2 : ffffffffffffffff 
[   18.084306] x1 : 0000000000081000 x0 : 0000000000081000 
[   18.089760] Call trace:
[   18.092265]  __pi_strlen+0x10/0x84
[   18.095762]  print_err_notifier+0x610/0x734
[   18.100059]  tegra234_cbb_isr+0xd0/0x170
[   18.104084]  __handle_irq_event_percpu+0x68/0x2b0
[   18.108908]  handle_irq_event_percpu+0x40/0xa0
[   18.113469]  handle_irq_event+0x50/0xa0
[   18.117399]  handle_fasteoi_irq+0xc0/0x170
[   18.121605]  generic_handle_irq+0x40/0x60
[   18.125721]  __handle_domain_irq+0x70/0xd0
[   18.129928]  gic_handle_irq+0x68/0x134
[   18.133771]  el1_irq+0xd0/0x180
[   18.136988]  cpuidle_enter_state+0xb8/0x410
[   18.141271]  cpuidle_enter+0x40/0x60
[   18.144943]  call_cpuidle+0x44/0x80
[   18.148520]  do_idle+0x208/0x270
[   18.151832]  cpu_startup_entry+0x30/0x60
[   18.155849]  rest_init+0xdc/0xe8
[   18.159164]  arch_call_rest_init+0x18/0x20
[   18.163366]  start_kernel+0x500/0x538
[   18.167119] Code: b200c3eb 927cec01 f2400c07 54000261 (a8c10c22) 
[   18.173386] ---[ end trace 68b64abc3ce25fc7 ]---
[   18.184309] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   18.191365] SMP: stopping secondary CPUs
[   18.195390] Kernel Offset: 0x581939490000 from 0xffff800010000000
[   18.201632] PHYS_OFFSET: 0xffffa49600000000
[   18.205922] CPU features: 0x08040006,4a80aa38
[   18.210394] Memory Limit: none
[   18.219641] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

I can reproduce the same kernel panic as yours after performing debian-based OTA update on the devkit.
Please let me check with internal and update to you once getting any result.

Awesome, thank you Kevin, I appreciate you making the effort to verify that.