Occasional Kernel Panic During nvpmodel Service Startup on Xavier JetPack 4.6.4

I am encountering a rare issue on my Xavier device running JetPack 4.6.4. Occasionally, during startup, the nvpmodel service triggers a kernel panic, but restarting the device usually resolves the issue. Here are the details:

Feb 28 04:10:11 work-desktop systemd[1]: Starting nvpmodel service…
Feb 28 04:10:11 work-desktop kernel: [ 8.217029] nvgpu: 17000000.gv11b tpc_pg_mask_store:843 [INFO] no value change, same mask already set
Feb 28 04:10:12 work-desktop kernel: [ 8.226776] Unable to handle kernel paging request at virtual address dead0000000000fe
Feb 28 04:10:12 work-desktop kernel: [ 8.227007] Mem abort info:
Feb 28 04:10:12 work-desktop kernel: [ 8.227107] ESR = 0x96000004
Feb 28 04:10:12 work-desktop kernel: [ 8.227197] Exception class = DABT (current EL), IL = 32 bits
Feb 28 04:10:12 work-desktop kernel: [ 8.227334] SET = 0, FnV = 0
Feb 28 04:10:12 work-desktop kernel: [ 8.227416] EA = 0, S1PTW = 0
Feb 28 04:10:12 work-desktop kernel: [ 8.227499] Data abort info:
Feb 28 04:10:12 work-desktop kernel: [ 8.227573] ISV = 0, ISS = 0x00000004
Feb 28 04:10:12 work-desktop kernel: [ 8.227664] CM = 0, WnR = 0
Feb 28 04:10:12 work-desktop kernel: [ 8.227748] [dead0000000000fe] address between user and kernel address ranges
Feb 28 04:10:12 work-desktop dbus-daemon[5053]: [system] Successfully activated service ‘org.freedesktop.hostname1’
Feb 28 04:10:12 work-desktop kernel: [ 8.227917] Internal error: Oops: 96000004 [#1] PREEMPT SMP
Feb 28 04:10:12 work-desktop kernel: [ 8.228054] Modules linked in: binfmt_misc bluedroid_pm userspace_alert nvgpu ip_tables x_tables
Feb 28 04:10:12 work-desktop kernel: [ 8.228323] CPU: 1 PID: 5864 Comm: nvpmodel Not tainted 4.9.253-tegra #33
Feb 28 04:10:12 work-desktop kernel: [ 8.228474] Hardware name: Jetson-AGXi (DT)
Feb 28 04:10:12 work-desktop kernel: [ 8.228577] task: ffffffc6b1cfb800 task.stack: ffffffc6a1ecc000
Feb 28 04:10:12 work-desktop kernel: [ 8.228718] PC is at __dump_page+0x38/0x1d0
Feb 28 04:10:12 work-desktop kernel: [ 8.228860] LR is at dump_page+0x28/0x38
Feb 28 04:10:12 work-desktop kernel: [ 8.229141] pc : [] lr : [] pstate: 00400045
Feb 28 04:10:12 work-desktop kernel: [ 8.229684] sp : ffffffc6a1ecf7c0
Feb 28 04:10:12 work-desktop kernel: [ 8.229950] x29: ffffffc6a1ecf7c0 x28: 0000000000000008
Feb 28 04:10:12 work-desktop kernel: [ 8.234951] x27: 0000000000000100 x26: 000000000000003f
Feb 28 04:10:12 work-desktop kernel: [ 8.240797] x25: 0000000000000001 x24: 0000000000000100
Feb 28 04:10:12 work-desktop kernel: [ 8.246394] x23: ffffffc6a7dd7000 x22: 0000000000000000
Feb 28 04:10:12 work-desktop kernel: [ 8.251752] x21: ffffff80093db110 x20: 0000000000000000
Feb 28 04:10:12 work-desktop kernel: [ 8.257419] x19: ffffffbf1ac55980 x18: 0000000000000001
Feb 28 04:10:12 work-desktop kernel: [ 8.262687] x17: 0000000000001350 x16: 000000000000c608
Feb 28 04:10:12 work-desktop kernel: [ 8.268706] x15: ffffffffffffffff x14: 000000000000002d
Feb 28 04:10:12 work-desktop kernel: [ 8.274480] x13: 0000000000000001 x12: 0000000000000100
Feb 28 04:10:12 work-desktop kernel: [ 8.279927] x11: 0088000000000000 x10: 0140000000000000
Feb 28 04:10:12 work-desktop kernel: [ 8.285943] x9 : 0000000000000000 x8 : ffffffc6a7dd7800
Feb 28 04:10:12 work-desktop kernel: [ 8.291719] x7 : 0000000670a00000 x6 : 0000000000000018
Feb 28 04:10:12 work-desktop kernel: [ 8.297232] x5 : ffffff800a1a5880 x4 : ffffff800a1a5820
Feb 28 04:10:12 work-desktop kernel: [ 8.302327] x3 : dead0000000000ff x2 : 0000000000000001
Feb 28 04:10:12 work-desktop kernel: [ 8.307907] x1 : dead0000000000ff x0 : dead0000000000fe
Feb 28 04:10:12 work-desktop kernel: [ 8.313241]
Feb 28 04:10:12 work-desktop kernel: [ 8.314648] Process nvpmodel (pid: 5864, stack limit = 0xffffffc6a1ecc000)
Feb 28 04:10:12 work-desktop kernel: [ 8.321039] Call trace:
Feb 28 04:10:12 work-desktop kernel: [ 8.323668] [] __dump_page+0x38/0x1d0
Feb 28 04:10:12 work-desktop kernel: [ 8.328214] [] dump_page+0x28/0x38
Feb 28 04:10:12 work-desktop kernel: [ 8.333026] [] split_page+0xc8/0xf0
Feb 28 04:10:12 work-desktop kernel: [ 8.337841] [] __alloc_buffer_pages+0x1cc/0x2b0
Feb 28 04:10:12 work-desktop kernel: [ 8.343701] [] __dma_alloc.isra.8+0x1e0/0x388
Feb 28 04:10:12 work-desktop kernel: [ 8.349039] [] arm_coherent_dma_alloc+0xa8/0xc0
Feb 28 04:10:12 work-desktop kernel: [ 8.355446] [] nvgpu_dma_alloc_flags_sys+0x110/0x380 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.362354] [] gk20a_init_fifo_setup_sw_common+0x3b0/0x6d8 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.369615] [] gk20a_init_fifo_setup_sw+0x54/0x2a8 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.375750] [] gk20a_init_fifo_support+0x28/0x48 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.382754] [] gk20a_finalize_poweron+0x48c/0x8f0 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.389731] [] gk20a_pm_finalize_poweron+0xe4/0x418 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.397095] [] gk20a_pm_runtime_resume+0x58/0x70 [nvgpu]
Feb 28 04:10:12 work-desktop kernel: [ 8.403473] [] pm_generic_runtime_resume+0x3c/0x58
Feb 28 04:10:12 work-desktop kernel: [ 8.409856] [] __rpm_callback+0x74/0xa0
Feb 28 04:10:12 work-desktop kernel: [ 8.415540] [] rpm_callback+0x34/0x98
Feb 28 04:10:12 work-desktop kernel: [ 8.420616] [] rpm_resume+0x470/0x710
Feb 28 04:10:12 work-desktop kernel: [ 8.425860] [] pm_runtime_forbid+0x64/0x78
Feb 28 04:10:12 work-desktop kernel: [ 8.431549] [] control_store+0xf4/0x118
Feb 28 04:10:12 work-desktop kernel: [ 8.436974] [] dev_attr_store+0x44/0x60
Feb 28 04:10:12 work-desktop kernel: [ 8.442490] [] sysfs_kf_write+0x58/0x80
Feb 28 04:10:12 work-desktop kernel: [ 8.447741] [] kernfs_fop_write+0xfc/0x1e0
Feb 28 04:10:12 work-desktop kernel: [ 8.453342] [] __vfs_write+0x48/0x118
Feb 28 04:10:12 work-desktop kernel: [ 8.458761] [] vfs_write+0xac/0x1b0
Feb 28 04:10:12 work-desktop kernel: [ 8.463579] [] SyS_write+0x5c/0xc8
Feb 28 04:10:12 work-desktop kernel: [ 8.468826] [] el0_svc_naked+0x34/0x38
Feb 28 04:10:12 work-desktop kernel: [ 8.474164] —[ end trace 68a4b6346344aded ]—

What could be causing this issue? Could it be related to the nvpmodel configuration or hardware? Are there any preventive measures or further diagnostic steps I can take?

Can this issue be reproduced with NV devkit?
What’s the repro’ rate?

The issue has only occurred once on-site; the laboratory has automatically powered on and off over 3,000 times, and the problem has not been reproduced yet.

Are there any mitigation measures here, such as methods to automatically restart in case it occurs?

syslog.log (10.5 MB)
Here is the syslog log information. Can you help analyze it in detail?

@kayccc