Jetpack5.1.5 oops many times for various reasons

我使用的系统是Jetpack5.1.5,我们使用的是自制的PCBA,显示使用的是HDMI

$ cat /etc/nv_tegra_release
# R35 (release), REVISION: 6.1, GCID: 39721438, BOARD: t186ref, EABI: aarch64, DATE: Tue Mar  4 10:13:09 UTC 2025

$ cat /etc/nv_boot_control.conf
TNSPEC 3767-301-0000-H.1-1-1-jetson-orin-nano-devkit-super-
COMPATIBLE_SPEC 3767-000-0000--1--jetson-orin-nano-devkit-super-
TEGRA_LEGACY_UPDATE false
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_EMMC_ONLY false
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

我们的系统在使用前做过静态的运行测试,测试过一段时间没有问题,当接入到我们的工作环境中,运行程序后,kernel经常oops,时间在几十分钟到几小时之间,我收集了最近的dump记录,目前已发现至少6种不同的调用栈。由于崩溃原因太多,我们无法进一步确定是什么原因导致的,希望能给些排查的思路。

我们上一版本使用的是Jetpack5.1.2系统,系统稳定运行,不会出现崩溃情况,两个Jetpack版本的硬件和运行的软件区别不大,出现崩溃仅仅是更新到了Jetpack5.1.5

[ 7167.106382] Call trace:
[ 7167.108888]  __free_pages+0x28/0x100
[ 7167.112546]  __vunmap+0x190/0x2c0
[ 7167.115937]  __vfree+0x38/0x90
[ 7167.119069]  vfree+0x3c/0x50
[ 7167.122058]  nvkms_free+0x40/0x50 [nvidia_modeset]
[ 7167.127008]  nvKmsIoctl+0xf0/0x1a8 [nvidia_modeset]
[ 7167.132029]  nvkms_ioctl_common+0x180/0x1b0 [nvidia_modeset]
[ 7167.137875]  nvidia_frontend_unlocked_ioctl+0x5c/0x78 [nvidia]
[ 7167.143854]  __arm64_sys_ioctl+0xac/0xf0
[ 7167.147881]  el0_svc_common.constprop.0+0x80/0x1d0
[ 7167.152799]  do_el0_svc+0x38/0xc0
[ 7167.156194]  el0_svc+0x1c/0x30
[ 7167.159319]  el0_sync_handler+0xa8/0xb0
[ 7167.163246]  el0_sync+0x16c/0x180
[ 1001.044056] Call trace:
[ 1001.046562]  xhci_td_cleanup+0x90/0x100
[ 1001.050488]  finish_td+0xc0/0x180
[ 1001.053871]  xhci_irq+0x698/0x1b40
[ 1001.057353]  tegra_xhci_irq+0x34/0x70
[ 1001.061103]  usb_hcd_irq+0x40/0x70
[ 1001.064581]  __handle_irq_event_percpu+0x68/0x2b0
[ 1001.069395]  handle_irq_event_percpu+0x40/0xa0
[ 1001.073945]  handle_irq_event+0x50/0xa0
[ 1001.077870]  handle_fasteoi_irq+0xc0/0x170
[ 1001.082071]  generic_handle_irq+0x40/0x60
[ 1001.086184]  __handle_domain_irq+0x70/0xd0
[ 1001.090384]  gic_handle_irq+0x68/0x134
[ 1001.094228]  el1_irq+0xd0/0x180
[ 1001.097451]  _raw_spin_unlock_irqrestore+0x38/0x70
[ 1001.102366]  arm_smmu_tlb_sync_context+0x7c/0x90
[ 1001.107096]  arm_smmu_iotlb_sync+0x54/0x120
[ 1001.111385]  __iommu_dma_unmap+0xf0/0x110
[ 1001.115495]  iommu_dma_unmap_page+0x50/0x90
[ 1001.119785]  dma_unmap_page_attrs+0x64/0x200
[ 1001.124163]  usb_hcd_unmap_urb_for_dma+0x70/0x110
[ 1001.128986]  unmap_urb_for_dma+0x48/0x60
[ 1001.132997]  __usb_hcd_giveback_urb+0x44/0x150
[ 1001.137549]  usb_giveback_urb_bh+0xb4/0x1a0
[ 1001.141833]  tasklet_action_common.isra.0+0xa8/0x1c0
[ 1001.146925]  tasklet_action+0x30/0x40
[ 1001.150668]  __do_softirq+0x140/0x3e8
[ 1001.154414]  irq_exit+0xc0/0xe0
[ 1001.157626]  __handle_domain_irq+0x74/0xd0
[ 1001.161814]  gic_handle_irq+0x68/0x134
[ 1001.165649]  el0_irq_naked+0x4c/0x54
[ 8529.510608] Call trace:
[ 8529.513112]  usb_hcd_check_unlink_urb+0x38/0x80
[ 8529.517753]  xhci_urb_dequeue+0x68/0x500
[ 8529.521767]  unlink1+0x4c/0x150
[ 8529.524970]  usb_hcd_flush_endpoint+0x110/0x130
[ 8529.529607]  usb_suspend_both+0x11c/0x260
[ 8529.533718]  usb_runtime_suspend+0x38/0x90
[ 8529.537908]  __rpm_callback+0xe0/0x150
[ 8529.541752]  rpm_callback+0x38/0xa0
[ 8529.545322]  rpm_suspend+0xe4/0x650
[ 8529.548892]  pm_runtime_work+0xc0/0xd0
[ 8529.552731]  process_one_work+0x1c4/0x4c0
[ 8529.556839]  worker_thread+0x54/0x450
[ 8529.560581]  kthread+0x148/0x170
[ 8529.563895]  ret_from_fork+0x10/0x18
[ 2498.284873] Call trace:
[ 2498.287378]  percpu_ref_get_many+0x3c/0xb0
[ 2498.291567]  refill_obj_stock+0x64/0xf0
[ 2498.295496]  obj_cgroup_uncharge+0x2c/0x40
[ 2498.299687]  memcg_slab_free_hook+0xf4/0x2a0
[ 2498.304060]  kmem_cache_free+0x108/0x430
[ 2498.308072]  file_free_rcu+0x50/0xa0
[ 2498.311727]  rcu_core+0x288/0xa10
[ 2498.315120]  rcu_core_si+0x18/0x20
[ 2498.318604]  __do_softirq+0x140/0x3e8
[ 2498.322349]  run_ksoftirqd+0x50/0x60
[ 2498.326009]  smpboot_thread_fn+0x1c4/0x280
[ 2498.330206]  kthread+0x148/0x170
[ 2498.333509]  ret_from_fork+0x10/0x18
[ 1060.593706] Call trace:
[ 1060.596212]  __mutex_lock.isra.0+0x130/0x600
[ 1060.600581]  __mutex_lock_slowpath+0x28/0x40
[ 1060.604951]  mutex_lock+0x74/0x80
[ 1060.608346]  pipe_read+0x304/0x480
[ 1060.611822]  new_sync_read+0x188/0x1a0
[ 1060.615669]  vfs_read+0x130/0x1c0
[ 1060.619065]  ksys_read+0xf0/0x110
[ 1060.622456]  __arm64_sys_read+0x28/0x40
[ 1060.626390]  el0_svc_common.constprop.0+0x80/0x1d0
[ 1060.631299]  do_el0_svc+0x38/0xc0
[ 1060.634694]  el0_svc+0x1c/0x30
[ 1060.637819]  el0_sync_handler+0xa8/0xb0
[ 1060.641746]  el0_sync+0x16c/0x180
[ 2072.996001] Call trace:
[ 2072.998511]  nvmap_handle_remove+0xec/0x100 [nvmap]
[ 2073.003512]  _nvmap_handle_free+0x9c/0x4a0 [nvmap]
[ 2073.008424]  nvmap_handle_put+0x10c/0x1f0 [nvmap]
[ 2073.013236]  nvmap_dmabuf_release+0x118/0x150 [nvmap]
[ 2073.018490]  nvgpu_dma_buf_release+0x60/0x80 [nvgpu]
[ 2073.023579]  dma_buf_release+0x9c/0x130
[ 2073.027508]  __dentry_kill+0x130/0x1c0
[ 2073.031350]  dput+0x1d0/0x330
[ 2073.034390]  __fput+0xc0/0x260
[ 2073.037518]  ____fput+0x24/0x30
[ 2073.040728]  task_work_run+0x88/0xe0
[ 2073.044396]  do_notify_resume+0x24c/0x990
[ 2073.048501]  work_pending+0xc/0x738

具体log

202509_kernel_oops.zip (306.4 KB)

*** Please note that this reply is generated by LLM automatically ***

The question appears to be a jumbled collection of keywords related to the NVIDIA Jetson Orin NX series. I’ll attempt to provide a comprehensive answer by breaking down the keywords and providing relevant information.

Jetson Orin NX: The Jetson Orin NX is a series of modules from NVIDIA, designed for edge AI and robotics applications. They are part of the Jetson Orin series, which also includes the Jetson Orin Nano and Jetson AGX Orin.

Jetpack: Jetpack is a software development kit (SDK) provided by NVIDIA for the Jetson platform. It includes a set of tools, libraries, and APIs for developing AI and computer vision applications.

PCBA: PCBA stands for Printed Circuit Board Assembly, which refers to the physical board that the Jetson module is mounted on.

HDMI: HDMI (High-Definition Multimedia Interface) is a digital video interface standard used for transmitting video and audio signals.

kernel oops dump: A kernel oops dump is a type of error message that occurs when the Linux kernel encounters a critical error. It provides information about the error, including the kernel version, CPU architecture, and other relevant details.

Jetpack 5.15: Jetpack 5.15 is a specific version of the Jetpack SDK, which includes support for the Linux kernel version 5.15.

log 202509: This appears to be a log file or a reference to a log entry, possibly related to a kernel oops dump or another error message.

kernel oops zip 3064 KB: This suggests that the kernel oops dump has been compressed into a ZIP file, which is approximately 3064 KB in size.

In summary, the question appears to be related to a kernel error or crash on a Jetson Orin NX module, possibly running Jetpack 5.15. The error message has been logged and compressed into a ZIP file. If you’re experiencing issues with your Jetson module, I recommend checking the NVIDIA forums or documentation for troubleshooting guides and support resources.

Comparison with other Jetson modules:

  • The Jetson Orin NX series is similar to the Jetson Orin Nano series, but with more powerful processors and more memory options.
  • The Jetson AGX Orin series is a more powerful and feature-rich version of the Jetson Orin NX series, with more advanced AI and computer vision capabilities.
  • The Jetson Xavier NX series is a previous generation of Jetson modules, which may not have the same level of performance or features as the Jetson Orin NX series.

Please note that this answer is based on the provided context and may not be comprehensive or entirely accurate. If you have further questions or clarification, please feel free to ask!

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Sorry for the late response.
Is this still an issue to support? Any result can be shared?

Yes, this is still an issue. In our application, there are many call stacks that cause system oops, so we have temporarily rolled back to Jetpack 5.1.2.