Kernel level issue Jetson Orin NX J4012

Hello everyone,

I am facing a severe hardware-level issue with my Jetson Orin NX. The Jetson is using the reComputed J4012 carrier board, running JetPack 5.1.2 [L4T 35.4.1] and Ubuntu 20.04.

An exception started appearing about three weeks ago, and I haven’t been able to resolve it:

[vie sep 20 15:36:19 2024] ------------[ cut here ]------------
[vie sep 20 15:36:19 2024] WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:937 tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024] Modules linked in: nvidia_modeset(O) fuse lzo_rle lzo_compress zram ramoops reed_solomon snd_soc_tegra186_asrc snd_soc_tegra210_iqc snd_soc_tegra210_ope snd_soc_tegra186_dspk snd_soc_tegra210_mvc snd_soc_tegra186_arad snd_soc_tegra210_afc snd_soc_tegra210_dmic snd_soc_tegra210_mixer snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_i2s iwlmvm snd_soc_tegra210_admaif mac80211 snd_soc_tegra_pcm snd_soc_tegra210_sfc snd_soc_tegra210_adsp aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce sha256_arm64 sha1_ce snd_soc_spdif_tx snd_soc_tegra_machine_driver snd_soc_tegra_utils snd_soc_simple_card_utils snd_soc_tegra210_ahub nvadsp tegra210_adma binfmt_misc snd_hda_codec_hdmi snd_hda_tegra tegra_bpmp_thermal snd_hda_codec userspace_alert snd_hda_core iwlwifi spi_tegra114 nv_imx477 cfg80211 r8168 nvidia(O) loop ina3221 pwm_fan nvgpu nvmap ip_tables x_tables [last unloaded: mtd]
[vie sep 20 15:36:19 2024] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O      5.10.120-tegra #1
[vie sep 20 15:36:19 2024] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin NX Developer Kit, BIOS 4.1-33958178 08/01/2023
[vie sep 20 15:36:19 2024] pstate: 20400089 (nzCv daIf +PAN -UAO -TCO BTYPE=--)
[vie sep 20 15:36:19 2024] pc : tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024] lr : tegra186_gpio_irq+0x11c/0x1f0
[vie sep 20 15:36:19 2024] sp : ffff800010003ef0
[vie sep 20 15:36:19 2024] x29: ffff800010003ef0 x28: ffff2635c0e94460 
[vie sep 20 15:36:19 2024] x27: ffffadaddfd1cfe8 x26: 0000000000000018 
[vie sep 20 15:36:19 2024] x25: ffff2635c5ea1880 x24: ffff2635c5f0a000 
[vie sep 20 15:36:19 2024] x23: 000000000000000c x22: 000000000000004c 
[vie sep 20 15:36:19 2024] x21: 00000000000000b9 x20: 0000000000000000 
[vie sep 20 15:36:19 2024] x19: ffffadaddf1cf290 x18: 0000000000000000 
[vie sep 20 15:36:19 2024] x17: 0000000000000000 x16: ffffadadde198810 
[vie sep 20 15:36:19 2024] x15: 0000000000000000 x14: 0000000000000000 
[vie sep 20 15:36:19 2024] x13: 0000000000000003 x12: 0000000000000500 
[vie sep 20 15:36:19 2024] x11: 0000000000000040 x10: ffffadaddfc87b60 
[vie sep 20 15:36:19 2024] x9 : ffffadaddfc87b58 x8 : ffff2635c04b9268 
[vie sep 20 15:36:19 2024] x7 : 0000000000000000 x6 : 0000000000000001 
[vie sep 20 15:36:19 2024] x5 : 0000000000000000 x4 : 0000000000000000 
[vie sep 20 15:36:19 2024] x3 : 0000000000000000 x2 : ffffadadde090d70 
[vie sep 20 15:36:19 2024] x1 : 000000000000004c x0 : 0000000000000000 
[vie sep 20 15:36:19 2024] Call trace:
[vie sep 20 15:36:19 2024]  tegra186_gpio_irq+0x1ac/0x1f0
[vie sep 20 15:36:19 2024]  generic_handle_irq+0x40/0x60
[vie sep 20 15:36:19 2024]  __handle_domain_irq+0x70/0xd0
[vie sep 20 15:36:19 2024]  gic_handle_irq+0x68/0x134
[vie sep 20 15:36:19 2024]  el1_irq+0xd0/0x180
[vie sep 20 15:36:19 2024]  cpuidle_enter_state+0xb8/0x410
[vie sep 20 15:36:19 2024]  cpuidle_enter+0x40/0x60
[vie sep 20 15:36:19 2024]  call_cpuidle+0x44/0x80
[vie sep 20 15:36:19 2024]  do_idle+0x208/0x270
[vie sep 20 15:36:19 2024]  cpu_startup_entry+0x2c/0x70
[vie sep 20 15:36:19 2024]  rest_init+0xdc/0xe8
[vie sep 20 15:36:19 2024]  arch_call_rest_init+0x18/0x20
[vie sep 20 15:36:19 2024]  start_kernel+0x500/0x538
[vie sep 20 15:36:19 2024] ---[ end trace 8c31d42c728e02cf ]---

Based on the log, I understand that the issue is related to the GPIO ports, but despite weeks of troubleshooting, I haven’t been able to find the root cause. This is creating a major problem in my system because every time I run the following command:

systemct restart nvargus-daemon.service

to restart the camera drivers and launch my application (which acquires real-time images from a process), the entire Jetson reboots unexpectedly.

To clarify, the exception mentioned above repeats indefinitely in the kernel logs, but it doesn’t seem to cause any issues in the system until I restart the nvargus-daemon service to launch my application. I’m confident that this is related to the error.

I would greatly appreciate any assistance or guidance to help me resolve this issue. If you need any additional information to clarify my situation, please feel free to ask.

Thanks in advance!

This issue is most likely coming from your HDMI port. Does it cause crash on your side?

1 Like

Yes, we also lost the display on the screen connected to the HDMI port. It doesn’t show anything, not even during boot.

This issue appeared suddenly, without us changing anything. We didn’t initially connect it to the problem, as it’s restarting the nvargus service that causes Linux to reboot. Could these issues be related? Could you guide me on how to resolve this?

This is a will not fix issue on rel-35. But fixed on rel-36.

Please upgrade to rel-36.

In our case, simply unplugging the screen connected to the HDMI port resolved the issue. We plan to replace the screen and review the circuit in the future, but for now, unplugging the screen has fixed the problem.

Thanks!

Plug in and out the monitor will print this gpio-tegra186.c log. But it shall not cause crash.

Other corner case might lead to crash (reboot/suspend).

I ran a test, and when I execute sudo systemctl restart nvargus-daemon.service after unplugging the monitor, it doesn’t cause a reboot. However, when I reconnect the monitor, the error instantly reappears in the dmesg log, and running the restart command again triggers the reboot.

It seems that having the monitor connected changes something that causes the reboot, but I can’t explain why.

Do you have any suggestions on where I should look or what I can do to debug this further?

You can try a WAR as this post mentioned. Disable the hotplug GPIO pin interrupt function.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.