Orin Nano 8GB devices with Jetpack 6.0 DP shut down randomly

We have several Orin Nano 8GB devices with Jetpack 6.0 DP, running in the field: output of /etc/nv_tegra_release below:

R36 (release), REVISION: 2.0, GCID: 35084178, BOARD: generic, EABI: aarch64, DATE: Tue Dec 19 05:55:03 UTC 2023
KERNEL_VARIANT: oot

Output of /ets/os-release

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian

Recently we noticed that some of them go offline randomly and the only way to bring them back back is to manually power cycle, which is time and resource consuming. Technicians doing power cycle report that light indicator is OFF on the devices when they visit the site to power cycle, this to us indicate a power problem.

We tried to lookup for errors in syslog and kernel log, most of them have the following logs in syslog before they shut down:

Example from device 1:

Jul 14 07:45:07 host1 kernel: [262829.646459]  phy_tegra194_p2u pcie_tegra194
Jul 14 07:45:07 host1 kernel: [262829.646467] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           OE     5.15.122-tegra #3
Jul 14 07:45:07 host1 kernel: [262829.646475] Hardware name: Unknown Orin Nano 8GB for DSBOARD-NX2/Orin Nano 8GB for DSBOARD-NX2, BIOS 36.2.0-gcid-34956989 11/30/2023
Jul 14 07:45:07 host1 kernel: [262829.646478] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jul 14 07:45:07 host1 kernel: [262829.646485] pc : dev_watchdog+0x3bc/0x3d0
Jul 14 07:45:07 host1 kernel: [262829.646492] lr : dev_watchdog+0x3bc/0x3d0
Jul 14 07:45:07 host1 kernel: [262829.646497] sp : ffff800008013d30
Jul 14 07:45:07 host1 kernel: [262829.646499] x29: ffff800008013d30 x28: 0000000000000001 x27: 0000000000000004
Jul 14 07:45:07 host1 kernel: [262829.646506] x26: ffff000082bcf880 x25: ffff0000a68d0480 x24: 0000000000000140
Jul 14 07:45:07 host1 kernel: [262829.646512] x23: ffff0000a68d03dc x22: 00000000ffffffff x21: ffffde5f89df6000
Jul 14 07:45:07 host1 kernel: [262829.646517] x20: ffff0000a68d0000 x19: 0000000000000000 x18: ffffffffffffffff
Jul 14 07:45:07 host1 kernel: [262829.646522] x17: ffff21a267a99000 x16: ffff800008010000 x15: ffffde5f8a20a525
Jul 14 07:45:07 host1 kernel: [262829.646527] x14: ffffffffffffffff x13: 74756f2064656d69 x12: 7420302065756575
Jul 14 07:45:07 host1 kernel: [262829.646532] x11: 712074696d736e61 x10: 7274203a29383631 x9 : 74203a2938363138
Jul 14 07:45:07 host1 kernel: [262829.646538] x8 : 7228203068746520 x7 : 0000000000000003 x6 : 0000000000000000
Jul 14 07:45:07 host1 kernel: [262829.646542] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000100
Jul 14 07:45:07 host1 kernel: [262829.646547] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000801f3e00
Jul 14 07:45:07 host1 kernel: [262829.646553] Call trace:
Jul 14 07:45:07 host1 kernel: [262829.646555]  dev_watchdog+0x3bc/0x3d0
Jul 14 07:45:07 host1 kernel: [262829.646561]  call_timer_fn+0x44/0x1c0
Jul 14 07:45:07 host1 kernel: [262829.646572]  __run_timers.part.0+0x228/0x2e0
Jul 14 07:45:07 host1 kernel: [262829.646577]  run_timer_softirq+0x48/0x80
Jul 14 07:45:07 host1 kernel: [262829.646583]  __do_softirq+0x130/0x3c8
Jul 14 07:45:07 host1 kernel: [262829.646588]  __irq_exit_rcu+0xe0/0x100
Jul 14 07:45:07 host1 kernel: [262829.646597]  irq_exit+0x1c/0x30
Jul 14 07:45:07 host1 kernel: [262829.646604]  handle_domain_irq+0x78/0xb0
Jul 14 07:45:07 host1 kernel: [262829.646613]  gic_handle_irq+0x68/0x150
Jul 14 07:45:07 host1 kernel: [262829.646624]  call_on_irq_stack+0x20/0x50
Jul 14 07:45:07 host1 kernel: [262829.646629]  do_interrupt_handler+0x70/0x80
Jul 14 07:45:07 host1 kernel: [262829.646634]  el1_interrupt+0x30/0x80
Jul 14 07:45:07 host1 kernel: [262829.646644]  el1h_64_irq_handler+0x18/0x30
Jul 14 07:45:07 host1 kernel: [262829.646651]  el1h_64_irq+0x7c/0x80
Jul 14 07:45:07 host1 kernel: [262829.646654]  cpuidle_enter_state+0xbc/0x3f0
Jul 14 07:45:07 host1 kernel: [262829.646661]  cpuidle_enter+0x44/0x60
Jul 14 07:45:07 host1 kernel: [262829.646665]  do_idle+0x220/0x2b0
Jul 14 07:45:07 host1 kernel: [262829.646671]  cpu_startup_entry+0x34/0x70
Jul 14 07:45:07 host1 kernel: [262829.646677]  secondary_start_kernel+0x14c/0x180
Jul 14 07:45:07 host1 kernel: [262829.646684]  __secondary_switched+0x90/0x94
Jul 14 07:45:07 host1 kernel: [262829.646691] ---[ end trace c7ed6b145a3c4e00 ]---
Jul 14 07:45:07 host1 kernel: [262830.668784] r8168: eth0: link up
Jul 14 07:45:08 host1 kernel: [262831.694067] r8168: eth0: link down
Jul 14 07:45:11 host1 kernel: [262834.808792] r8168: eth0: link up
Jul 15 01:46:05 host1 kernel: [327687.554952] loop1: detected capacity change from 0 to 8
Jul 15 03:33:02 host1 kernel: [334103.784333] r8168: eth0: link up
Jul 15 06:33:05 host1 kernel: [344907.254078] loop1: detected capacity change from 0 to 8
Jul 15 06:33:47 host1 kernel: [344949.181213] loop1: detected capacity change from 0 to 8

Example from device 2:

Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.770968] ------------[ cut here ]------------
 
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.770981] refcount_t: addition on 0; use-after-free.
 
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771003] WARNING: CPU: 5 PID: 3540 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x150
 
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771032] Modules linked in: nf_log_syslog nft_nat nft_log nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nvidia_modeset(OE) xt_addrtype nft_compat br_netfilter snd_soc_tegra210_admaif_oot(O) snd_soc_tegra186_asrc_oot(O) snd_soc_tegra_pcm snd_soc_tegra210_mixer_oot(O) snd_soc_tegra186_arad_oot(O) snd_soc_tegra210_afc_oot(O) snd_soc_tegra210_mvc_oot(O) snd_soc_tegra186_dspk_oot(O) snd_soc_tegra210_ope_oot(O) snd_soc_tegra210_adx_oot(O) snd_soc_tegra210_sfc_oot(O) snd_soc_tegra210_dmic_oot(O) snd_soc_tegra210_amx_oot(O) snd_soc_tegra210_i2s_oot(O) tegra210_adma snd_soc_tegra210_ahub_oot(O) spidev r8168(O) nvvrs_pseq_rtc(O) mttcan(O) nvpps(O) can_dev tegra_cactmon_mc_all(O) tegra234_aon(O) tegra194_gte(OE) tegra_aconnect at24 pwm_tegra_tachometer(O) lzo_rle lzo_compress snd_hda_codec_hdmi mc_hwpm(O) zram tegra_pcie_dma_test(O) zsmalloc tegra_pcie_edma(O) snd_hda_tegra snd_hda_codec pwm_tegra spi_tegra114 snd_hda_core host1x_fence(O) nvhost_isp5(O)
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771213]  nvhost_nvcsi_t194(O) nvhost_vi5(O) snd_soc_tegra_machine_driver_oot(O) snd_soc_tegra_utils_oot(O) nvidia(OE) crct10dif_ce snd_soc_simple_card_utils tegra234_oc_event(O) tegra_camera(O) nvpmodel_clk_cap(O) v4l2_dv_timings tegra_dce(O) nvhost_nvcsi(O) bridge thermal_trip_event(O) tegra_bpmp_thermal tegra_camera_platform(O) rfkill nvidia_vrs_pseq(O) tsecriscv(O) capture_ivc(O) stp tegra_camera_rtcpu(O) llc ivc_bus(O) usb_f_ncm hsp_mailbox_client(O) usb_f_mass_storage ivc_ext(O) governor_userspace v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common tegra_drm_next(O) videodev tegra_wmark(O) mc nvhost_capture(O) nvhwpm(O) host1x_nvhost(O) cec drm_kms_helper usb_f_acm u_serial usb_f_rndis u_ether libcomposite ina3221 pwm_fan nvgpu(O) governor_pod_scaling(O) nf_tables mc_utils(O) libcrc32c host1x_next(O) nfnetlink nvmap(O) nvsciipc(O) drm fuse ip_tables x_tables tegra_xudc ucsi_ccg typec_ucsi typec nvme nvme_core phy_tegra194_p2u
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771367]  pcie_tegra194
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771375] CPU: 5 PID: 3540 Comm: netrinos Tainted: G           OE     5.15.122-tegra #3
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771383] Hardware name: Unknown Orin Nano 8GB for DSBOARD-NX2/Orin Nano 8GB for DSBOARD-NX2, BIOS 36.2.0-gcid-34956989 11/30/2023
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771387] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771394] pc : refcount_warn_saturate+0xa0/0x150
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771404] lr : refcount_warn_saturate+0xa0/0x150
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771412] sp : ffff800018a6b8d0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771415] x29: ffff800018a6b8d0 x28: ffff00008447ebb8 x27: ffff0000844bb400
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771426] x26: ffff0000de79c2b0 x25: ffff800018a6b9b0 x24: ffff0001d4107a80
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771436] x23: 0000000000001000 x22: ffff0000de4f3400 x21: ffff0000de79c4d0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771445] x20: 0000000000000000 x19: 00000000ffffffa6 x18: 0000000000000000
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771454] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771464] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771473] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : 612d657375203b30
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771482] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771491] x5 : ffff0001f12dc9f0 x4 : 00000000fffff2db x3 : ffffb3335dbe8938
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771500] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000117768000
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771509] Call trace:
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771513]  refcount_warn_saturate+0xa0/0x150
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771520]  wg_get_device_dump+0x830/0x8f0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771531]  genl_lock_dumpit+0x4c/0x70
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771541]  netlink_dump+0x110/0x3b0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771548]  netlink_recvmsg+0x1e4/0x3d0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771554]  ____sys_recvmsg+0x2fc/0x440
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771560]  ___sys_recvmsg+0x98/0x130
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771565]  __sys_recvmsg+0x78/0xd0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771571]  __arm64_sys_recvmsg+0x34/0x50
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771576]  invoke_syscall+0x5c/0x130
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771586]  el0_svc_common.constprop.0+0x64/0x110
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771594]  do_el0_svc+0x74/0xa0
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771602]  el0_svc+0x28/0x80
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771612]  el0t_64_sync_handler+0xa4/0x130
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771620]  el0t_64_sync+0x1a4/0x1a8
Jul 17, 2024 20:11:23.063
Jul 17 12:11:23 host2 kernel: [84647.771627] ---[ end trace 2dda305bcdd5dc7b ]---

Did anyone had similar experience?

Any input on analysing logs is much appreciated.

Upgrade to the Jetpack6 GA version please. DP version is just as its name… should not use it anymore when GA version is coming out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.