我使用的系统是Jetpack5.1.5,我们使用的是自制的PCBA,显示使用的是HDMI
$ cat /etc/nv_tegra_release
# R35 (release), REVISION: 6.1, GCID: 39721438, BOARD: t186ref, EABI: aarch64, DATE: Tue Mar 4 10:13:09 UTC 2025
$ cat /etc/nv_boot_control.conf
TNSPEC 3767-301-0000-H.1-1-1-jetson-orin-nano-devkit-super-
COMPATIBLE_SPEC 3767-000-0000--1--jetson-orin-nano-devkit-super-
TEGRA_LEGACY_UPDATE false
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_EMMC_ONLY false
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0
我们的系统在使用前做过静态的运行测试,测试过一段时间没有问题,当接入到我们的工作环境中,运行程序后,kernel经常oops,时间在几十分钟到几小时之间,我收集了最近的dump记录,目前已发现至少6种不同的调用栈。由于崩溃原因太多,我们无法进一步确定是什么原因导致的,希望能给些排查的思路。
我们上一版本使用的是Jetpack5.1.2系统,系统稳定运行,不会出现崩溃情况,两个Jetpack版本的硬件和运行的软件区别不大,出现崩溃仅仅是更新到了Jetpack5.1.5
[ 7167.106382] Call trace:
[ 7167.108888] __free_pages+0x28/0x100
[ 7167.112546] __vunmap+0x190/0x2c0
[ 7167.115937] __vfree+0x38/0x90
[ 7167.119069] vfree+0x3c/0x50
[ 7167.122058] nvkms_free+0x40/0x50 [nvidia_modeset]
[ 7167.127008] nvKmsIoctl+0xf0/0x1a8 [nvidia_modeset]
[ 7167.132029] nvkms_ioctl_common+0x180/0x1b0 [nvidia_modeset]
[ 7167.137875] nvidia_frontend_unlocked_ioctl+0x5c/0x78 [nvidia]
[ 7167.143854] __arm64_sys_ioctl+0xac/0xf0
[ 7167.147881] el0_svc_common.constprop.0+0x80/0x1d0
[ 7167.152799] do_el0_svc+0x38/0xc0
[ 7167.156194] el0_svc+0x1c/0x30
[ 7167.159319] el0_sync_handler+0xa8/0xb0
[ 7167.163246] el0_sync+0x16c/0x180
[ 1001.044056] Call trace:
[ 1001.046562] xhci_td_cleanup+0x90/0x100
[ 1001.050488] finish_td+0xc0/0x180
[ 1001.053871] xhci_irq+0x698/0x1b40
[ 1001.057353] tegra_xhci_irq+0x34/0x70
[ 1001.061103] usb_hcd_irq+0x40/0x70
[ 1001.064581] __handle_irq_event_percpu+0x68/0x2b0
[ 1001.069395] handle_irq_event_percpu+0x40/0xa0
[ 1001.073945] handle_irq_event+0x50/0xa0
[ 1001.077870] handle_fasteoi_irq+0xc0/0x170
[ 1001.082071] generic_handle_irq+0x40/0x60
[ 1001.086184] __handle_domain_irq+0x70/0xd0
[ 1001.090384] gic_handle_irq+0x68/0x134
[ 1001.094228] el1_irq+0xd0/0x180
[ 1001.097451] _raw_spin_unlock_irqrestore+0x38/0x70
[ 1001.102366] arm_smmu_tlb_sync_context+0x7c/0x90
[ 1001.107096] arm_smmu_iotlb_sync+0x54/0x120
[ 1001.111385] __iommu_dma_unmap+0xf0/0x110
[ 1001.115495] iommu_dma_unmap_page+0x50/0x90
[ 1001.119785] dma_unmap_page_attrs+0x64/0x200
[ 1001.124163] usb_hcd_unmap_urb_for_dma+0x70/0x110
[ 1001.128986] unmap_urb_for_dma+0x48/0x60
[ 1001.132997] __usb_hcd_giveback_urb+0x44/0x150
[ 1001.137549] usb_giveback_urb_bh+0xb4/0x1a0
[ 1001.141833] tasklet_action_common.isra.0+0xa8/0x1c0
[ 1001.146925] tasklet_action+0x30/0x40
[ 1001.150668] __do_softirq+0x140/0x3e8
[ 1001.154414] irq_exit+0xc0/0xe0
[ 1001.157626] __handle_domain_irq+0x74/0xd0
[ 1001.161814] gic_handle_irq+0x68/0x134
[ 1001.165649] el0_irq_naked+0x4c/0x54
[ 8529.510608] Call trace:
[ 8529.513112] usb_hcd_check_unlink_urb+0x38/0x80
[ 8529.517753] xhci_urb_dequeue+0x68/0x500
[ 8529.521767] unlink1+0x4c/0x150
[ 8529.524970] usb_hcd_flush_endpoint+0x110/0x130
[ 8529.529607] usb_suspend_both+0x11c/0x260
[ 8529.533718] usb_runtime_suspend+0x38/0x90
[ 8529.537908] __rpm_callback+0xe0/0x150
[ 8529.541752] rpm_callback+0x38/0xa0
[ 8529.545322] rpm_suspend+0xe4/0x650
[ 8529.548892] pm_runtime_work+0xc0/0xd0
[ 8529.552731] process_one_work+0x1c4/0x4c0
[ 8529.556839] worker_thread+0x54/0x450
[ 8529.560581] kthread+0x148/0x170
[ 8529.563895] ret_from_fork+0x10/0x18
[ 2498.284873] Call trace:
[ 2498.287378] percpu_ref_get_many+0x3c/0xb0
[ 2498.291567] refill_obj_stock+0x64/0xf0
[ 2498.295496] obj_cgroup_uncharge+0x2c/0x40
[ 2498.299687] memcg_slab_free_hook+0xf4/0x2a0
[ 2498.304060] kmem_cache_free+0x108/0x430
[ 2498.308072] file_free_rcu+0x50/0xa0
[ 2498.311727] rcu_core+0x288/0xa10
[ 2498.315120] rcu_core_si+0x18/0x20
[ 2498.318604] __do_softirq+0x140/0x3e8
[ 2498.322349] run_ksoftirqd+0x50/0x60
[ 2498.326009] smpboot_thread_fn+0x1c4/0x280
[ 2498.330206] kthread+0x148/0x170
[ 2498.333509] ret_from_fork+0x10/0x18
[ 1060.593706] Call trace:
[ 1060.596212] __mutex_lock.isra.0+0x130/0x600
[ 1060.600581] __mutex_lock_slowpath+0x28/0x40
[ 1060.604951] mutex_lock+0x74/0x80
[ 1060.608346] pipe_read+0x304/0x480
[ 1060.611822] new_sync_read+0x188/0x1a0
[ 1060.615669] vfs_read+0x130/0x1c0
[ 1060.619065] ksys_read+0xf0/0x110
[ 1060.622456] __arm64_sys_read+0x28/0x40
[ 1060.626390] el0_svc_common.constprop.0+0x80/0x1d0
[ 1060.631299] do_el0_svc+0x38/0xc0
[ 1060.634694] el0_svc+0x1c/0x30
[ 1060.637819] el0_sync_handler+0xa8/0xb0
[ 1060.641746] el0_sync+0x16c/0x180
[ 2072.996001] Call trace:
[ 2072.998511] nvmap_handle_remove+0xec/0x100 [nvmap]
[ 2073.003512] _nvmap_handle_free+0x9c/0x4a0 [nvmap]
[ 2073.008424] nvmap_handle_put+0x10c/0x1f0 [nvmap]
[ 2073.013236] nvmap_dmabuf_release+0x118/0x150 [nvmap]
[ 2073.018490] nvgpu_dma_buf_release+0x60/0x80 [nvgpu]
[ 2073.023579] dma_buf_release+0x9c/0x130
[ 2073.027508] __dentry_kill+0x130/0x1c0
[ 2073.031350] dput+0x1d0/0x330
[ 2073.034390] __fput+0xc0/0x260
[ 2073.037518] ____fput+0x24/0x30
[ 2073.040728] task_work_run+0x88/0xe0
[ 2073.044396] do_notify_resume+0x24c/0x990
[ 2073.048501] work_pending+0xc/0x738
具体log
202509_kernel_oops.zip (306.4 KB)