Xavier NX reboot loop

pepijn.vanheiningen · March 15, 2024, 12:21pm

Hi,
We are having trouble with Xavier NX devices stuck in a reboot loop. So far all devices will recover after multiple reboots, but that can take up to 30 minutes. From UART we discovered the reboot is due to a kernel crash, logs of this crash are included below:

[    3.854495] systemd-journald[2162]: File /var/log/journal/5a8660aba01c4ab3ac72ad16008d18ed/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[    3.864407] using random self ethernet address
[    3.864549] using random host ethernet address
[    3.959066] wm8960: no symbol version for module_layout
[    3.959230] wm8960: loading out-of-tree module taints kernel.
[    4.254661] mwifiex_pcie: try set_consistent_dma_mask(32)
[    4.255022] mwifiex_pcie: PCI memory map Virt0: ffffff8012500000 PCI memory map Virt2: ffffff8013e00000
[    4.543625] random: crng init done
[    4.543759] random: 7 urandom warning(s) missed due to ratelimiting
[    8.507257] podgov: can't create debugfs directory
[    8.507425] Kernel panic - not syncing: nvhost_scale_emc_debug_init
[    8.507565] CPU: 5 PID: 4298 Comm: gst-plugin-scan Tainted: G           O    4.9.140-tegra #1
[    8.507732] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[    8.507859] Call trace:
[    8.507948] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
[    8.508062] [<ffffff800808c37c>] show_stack+0x24/0x30
[    8.508171] [<ffffff800845c7a0>] dump_stack+0x98/0xc0
[    8.508280] [<ffffff80081c1438>] panic+0x11c/0x298
[    8.508389] [<ffffff8008cbdc40>] nvhost_scale_emc_debug_init.isra.12+0x128/0x1a0
[    8.508537] [<ffffff8008cbdfec>] nvhost_pod_event_handler+0x334/0x400
[    8.508663] [<ffffff8008cbaf14>] devfreq_add_device+0x284/0x408
[    8.508781] [<ffffff8008cbb0fc>] devm_devfreq_add_device+0x64/0xc0
[    8.509324] [<ffffff8000fdbec8>] gk20a_scale_init+0xf0/0x190 [nvgpu]
[    8.509792] [<ffffff8000fd50e8>] gk20a_pm_finalize_poweron+0x370/0x400 [nvgpu]
[    8.510296] [<ffffff8000fd5330>] gk20a_busy+0x1b8/0x4f0 [nvgpu]
[    8.510974] [<ffffff800878c91c>] pm_generic_runtime_resume+0x3c/0x58
[    8.517707] [<ffffff800878ec64>] __rpm_callback+0x74/0xa0
[    8.523046] [<ffffff800878ecc4>] rpm_callback+0x34/0x98
[    8.528136] [<ffffff8008790160>] rpm_resume+0x470/0x710
[    8.533204] [<ffffff800879044c>] __pm_runtime_resume+0x4c/0x70
[    8.539058] [<ffffff8000fd524c>] gk20a_busy+0xd4/0x4f0 [nvgpu]
[    8.545095] [<ffffff8000fb6f74>] gk20a_ctrl_dev_open+0x8c/0x168 [nvgpu]
[    8.551508] [<ffffff8008262314>] chrdev_open+0x94/0x198
[    8.557006] [<ffffff8008258de0>] do_dentry_open+0x1b8/0x318
[    8.562692] [<ffffff800825a388>] vfs_open+0x58/0x88
[    8.567376] IPv6: ADDRCONF(NETDEV_UP): mlan0: link is not ready
[    8.567758] IPv6: ADDRCONF(NETDEV_UP): mlan0: link is not ready
[    8.571989] IPv6: ADDRCONF(NETDEV_UP): mlan1: link is not ready
[    8.572183] IPv6: ADDRCONF(NETDEV_UP): mlan1: link is not ready
[    8.591219] [<ffffff800826d644>] do_last+0x454/0xe60
[    8.596205] [<ffffff800826e0e0>] path_openat+0x90/0x378
[    8.601454] [<ffffff800826f650>] do_filp_open+0x70/0xe8
[    8.606528] [<ffffff800825a84c>] do_sys_open+0x174/0x258
[    8.611866] [<ffffff800825a9b4>] SyS_openat+0x3c/0x50
[    8.617116] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[    8.622201] SMP: stopping secondary CPUs
[    8.626391] Kernel Offset: disabled
[    8.629621] Memory Limit: none
[    8.633030] trusty-log panic notifier - trusty version Built: 12:18:19 Oct 16 2020 [    8.648388] Rebooting in 5 seconds..

Our initial analysis points towards a kernel panic when the GPU code is trying to set the EMC clock speed. We are running L4T 32.4.4. We would like some help finding the root cause and a solution to this problem.

KevinFFF · March 18, 2024, 1:36am

Hi pepijn.vanheiningen,

Are you using the devkit or custom board for Xavier NX?

Would you board hit the kernel panic before any modification?

Have you also tried with the latest R32.7.4 release?

CPeppenster · March 20, 2024, 9:58am

We are using a custom board. So far we’ve been able to see that the issue stays with the Tegra module. If we move the Tegra module from an affected system to a healthy system the problem moves to the healthy system. In addition if we move the Tegra from a healthy system to the affected system the problem disappears.

Today we will test what happens if we a R35.3.1 based image. Using a R32.7.4 will be quite some work for us, so hopefully the R35.3.1 will give you enough information.

We currently have a fallout of around 40% with this problem so it’s of critical priority.

KevinFFF · March 21, 2024, 2:12am

Hi CPeppenster,

Are you working with pepijn.vanheiningen?
Please also help to confirm if you can reproduce the issue on the devkit.
(i.e. move the Xavier NX module to the devkit board to check)

For R35, we would suggest verifying with the latest R35.5.0.

system · April 10, 2024, 5:58am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kernel panic has occurred when doing reboot stress test with Jetson Linux 35.1 Jetson Xavier NX kernel , reboot	4	1411	October 12, 2022
Jetson Xavier NX, kernel panic at pc : tegra194_cbb_err_isr+0x19c/0x1b0 Jetson Xavier NX kernel	2	190	May 15, 2024
JETSON_XAVIER_NX random happen Kernel panic - not syncing: Oops - BUG Jetson Xavier NX kernel	2	743	July 18, 2022
Xavier Kernel Panic on Jetpack 5.1.1 Jetson AGX Xavier kernel	10	840	October 12, 2023
Jetpack 5.1 kernel panic on reboot Jetson Xavier NX kernel	8	931	March 29, 2023
Xavier NX Low EMC Frequency Jetson Xavier NX kernel , power	4	890	February 21, 2024
Xavier nx (Jetpack5.1.1) system crash after "reboot" command Jetson Xavier NX boot , kernel	8	763	November 4, 2023
Xavier kernel crashes randomly on Jetpack 5.1.1 Jetson AGX Xavier can-bus	9	87	August 25, 2024
Jetson AGX Xavier eqos ethernet driver causing kernel panic Jetson AGX Xavier ethernet	5	998	August 8, 2022
R35.2.1 Xavier NX Kernel Panic on boot (CPU:0, Error: cbb-noc@2300000, irq=15) Jetson Xavier NX boot	23	941	October 27, 2023

Xavier NX reboot loop

Related topics