We use the GMSL camera in the Thor EA version, and a kernel panic problem caused by VI timeout occurs during the open close test.
This phenomenon has also been repeated many times in Orin:
We use a GMSL camera. It is possible that the camera hardware is not in good contact, vi reports a timeout error, and triggering revovery fails. However, it should not cause a kernel panic.
Jul 31 03:18:31 mi-desktop kernel: [22961.465435] tegra194-vi5 13e40000.host1x:vi1@14c00000: capture control message timed out
Jul 31 03:18:31 mi-desktop kernel: [22961.465668] tegra194-vi5 13e40000.host1x:vi1@14c00000: vi_capture_release: release channel IVC failed
Jul 31 03:18:31 mi-desktop kernel: [2296…
[98680.434211] ox05b1s 2-001b: ox05b1s_stop_streaming
[98680.434218] max96724 2-0027: max96724_stop_streaming
[98680.521470] tegra186-cam-rtcpu 81893d0000.rtcpu: Alert: Camera RTCPU gone bad! restoring it immediately!!
[98682.540934] 81893d0000.rtcpu:hsp-vm1: request 0x41000000: response timeout
[98682.540961] 81893d0000.rtcpu:hsp-vm1: BYE failed: 0xffffff92
[98682.727367] 81893d0000.rtcpu:hsp-vm1: camrtc_hsp_vm_read_boot_log: RTCPU boot complete
[98682.957081] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[98682.957111] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[98682.960929] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[98682.960943] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[98682.960954] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[98682.961377] tegra194-vi5 8181200000.host1x:vi0@8188400000: vi_capture_release: control failed, errno 1
[98682.961380] tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: control failed, errno 1
[98682.961411] video4linux video3: vi capture release failed
[98682.961414] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[98682.970387] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[98682.979133] video4linux video1: vi capture release failed
[98683.034643] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[98683.034646] tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: control failed, errno 1
[98683.042367] ox05b1s 2-001c: ox05b1s_stop_streaming
[98683.051757] max96724 2-0027: max96724_stop_streaming
[98683.051782] video4linux video2: vi capture release failed
[98683.057004] tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
[98684.077078] tegra194-vi5 8181200000.host1x:vi0@8188400000: capture control message timed out
[98684.077104] tegra194-vi5 8181200000.host1x:vi0@8188400000: vi_capture_control_send_message: failed to send IVC control message
[98684.083058] tegra-nvcsi 8181200000.host1x:nvcsi@8188000000: csi5_stream_close: Error in closing stream_id=0, csi_port=0
[98684.093987] tegra194-vi5 8181200000.host1x:vi0@8188400000: vi_capture_release: setup channel first
[98684.102605] video4linux video1: vi capture release failed
[98684.114031] ------------[ cut here ]------------
[98684.114044] refcount_t: addition on 0; use-after-free.
[98684.114057] WARNING: CPU: 9 PID: 1281755 at lib/refcount.c:25 refcount_warn_saturate+0x120/0x144
[98684.114071] Modules linked in: can_raw can nvidia_drm(OE) nvidia_modeset(OE) nvidia_uvm(OE) qrtr bridge stp llc usb_f_ncm usb_f_mass_storage nvidia(OE) usb_f_acm u_serial governor_pod_scaling(O) usb_)
[98684.114153] tegra23x_psc(O) nvpmodel_clk_cap(O) tegra234_oc_event(O) snd_hda_codec_hdmi tegra_aocluster(O) tegra_aconnect nvethernet(O) max96724_berxel(O) snd_hda_tegra snd_hda_codec coresight_stm ss
[98684.114219] videobuf2_v4l2 videodev videobuf2_common mc camchar(O) rtcpu_debug(O) tegra_camera_rtcpu(O) ivc_bus(O) hsp_mailbox_client(O) nvme_fabrics fuse drm nfnetlink ip_tables x_tables ipv6 pwm_f)
[98684.114249] CPU: 9 PID: 1281755 Comm: orion_server Tainted: G W OE 6.8.12-tegra #9
[98684.114254] Hardware name: NVIDIA NVIDIA Jetson Thor Developer Kit/Jetson, BIOS 38.0.0-gcid-41204431 07/01/2025
[98684.114256] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=–)
[98684.114260] pc : refcount_warn_saturate+0x120/0x144
[98684.114264] lr : refcount_warn_saturate+0x120/0x144
[98684.114267] sp : ffff8000ab58b9d0
[98684.114268] x29: ffff8000ab58b9d0 x28: ffff00008501ddc0 x27: 0000000000000009
[98684.114273] x26: 0000000000000000 x25: 0000ffffbe0e6c50 x24: ffff00008501e3c0
[98684.114277] x23: ffff0000971d4108 x22: ffff0000971d4080 x21: ffff0000971d4590
uart_dmesg.log (9.6 MB)
Another log entry shows that even though the VI reset was successful, it still caused a kernel panic.
2025-08-28T03:12:12.342695+00:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
2025-08-28T03:12:12.385391+00:00 tegra-ubuntu kernel: [RCE] VI ch[32] frame configuration: 2560x800
2025-08-28T03:12:12.385414+00:00 tegra-ubuntu kernel: [RCE] left skip pixels=0 top skip lines=0
2025-08-28T03:12:12.385415+00:00 tegra-ubuntu kernel: [RCE] right crop pixels=2560 bottom crop lines=800
2025-08-28T03:12:12.385417+00:00 tegra-ubuntu kernel: [RCE] pixel format=VI_PIXFMT_FORMAT_T_R8 fmt=5
2025-08-28T03:12:13.805540+00:00 tegra-ubuntu kernel: ------------[ cut here ]------------
2025-08-28T03:12:13.805563+00:00 tegra-ubuntu kernel: refcount_t: addition on 0; use-after-free.
thor_log_er.zip (773.7 KB)
hello deeptalkcamera,
please moving to JP-7.0/r38.2 for development.
We are pleased to announce the production release of JetPack 7.0. JetPack 7.0 is a major upgrade in the JetPack series, supporting the NVIDIA Jetson AGX Thor Developer Kit and the Thor based T5000 module. With JetPack 7, Jetson software aligns with the Server Base System Architecture (SBSA), positioning Jetson Thor alongside industry-standard ARM server design. JetPack 7.0 packages Jetson Linux 38.2 with Linux Kernel 6.8 and Ubuntu 24.04 LTS based root file system.
What’s new in JetPack 7.0
NO…
panic_GA_0828.tar.gz (17.3 MB)
We also reproduced it on Thor GA version, the version number is: 38.2.0-gcid-41844464 08/22/2025
[57360.788822] tegra194-vi5 8181200000.host1x:vi1@8188c00000: capture control message timed out
[57360.788850] tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_control_send_message: failed to send IVC control message
[57360.794800] tegra-nvcsi 8181200000.host1x:nvcsi@8188000000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[57360.805740] tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: setup channel first
[57360.814353] video4linux video2: vi capture release failed
[57360.822162] ------------[ cut here ]------------
[57360.824466] refcount_t: addition on 0; use-after-free.
[57360.824481] WARNING: CPU: 5 PID: 729581 at lib/refcount.c:25 refcount_warn_saturate+0x120/0x144
[57360.824495] Modules linked in: rfcomm nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nvidia_drm(O) nvidia_modeset(O) nvidia_uvm(O) qrtr b)
[57360.824587] snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils can rfkill max96717(O) arm_spe_pmu p008g_depth(O) p008g_rgb(O) cam_cdi_tsc(O) pps_tsync(O) tegra234_oc_)
[57360.824656] mc_utils(O) capture_ivc(O) v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc camchar(O) rtcpu_debug(O) tegra_camera_rtcpu(O) ivc_bu)
[57360.824693] CPU: 5 PID: 729581 Comm: orion_server Tainted: G W O 6.8.12-tegra #1
[57360.824698] Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 38.2.0-gcid-41844464 08/22/2025
[57360.824701] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=–)
[57360.824705] pc : refcount_warn_saturate+0x120/0x144
[57360.824710] lr : refcount_warn_saturate+0x120/0x144
[57360.824713] sp : ffff8000c73539d0
[57360.824715] x29: ffff8000c73539d0 x28: ffff0000cb3d8000 x27: 0000000000000009
[57360.824719] x26: 0000000000000000 x25: 0000ffff81fa6c50 x24: ffff0000cb3d8600
hello deeptalkcamera,
may I double check your reproduce steps?
it seems you’re keeping error recover repeatedly to reproduce the error?
This test is to open and close the camera in a loop with an interval of 15 seconds.
Error recovery should be triggered by itself after VI timeout?
hello deeptalkcamera,
we cannot reproduce the issue,
we’ve tested locally with JP-7.0 GA + IMX274 for ~600 loops with v4l2 IOCTL.
if there’s camera issue, error recovery will trigger to reset/restart the VI channel.
there’s fix, please apply this patch to fix vi5_channel_error_recover memory leak.
for instance, 0001-vi5-fix-vi5_channel_error_recover-memory-leak.patch (1.8 KB)
We simultaneously enabled four GMSL cameras, turning them on and off at 15-second intervals.
It took us over 10 hours of testing to replicate the issue.
Does the patch “0001-vi5-fix-vi5_channel_error_recover-memory-leak.patch“ help with fix the kernel panic issue?
hello deeptalkcamera,
as you can see.. it fixes the error recovery memory leak.
kernel.log (12.1 MB)
uart_dmesg.log (8.4 MB)
The problem reappeared after using 0001-vi5-fix-vi5_channel_error_recover-memory-leak.patch. Please help to check the log.
hello deeptalkcamera,
according to the error logs.. refcount_t: addition on 0; use-after-free
it looks like the multi-cam race condition we’ve seen on JP-6.
please give it another try to apply this patch, 0001-rtcpu-capture-ivc-fix-multi-cam-race-condition.patch (2.2 KB)
you may see-also Topic 337341 for reference.
hi,JerryChang
seems there are two issues may cause refcount_t problem.
the two issues both print log “fatal: error recovery failed”,both cause tegra_channel_kthread_capture_dequeue() thread exit,
and then vi5_channel_stop_streaming() call kthread_stop() to stop tegra_channel_kthread_capture_dequeue(),but tegra_channel_kthread_capture_dequeue() thread already dead, and then it will cause refcount_t issue.
dma memory leak, cause the tegra_channel_kthread_capture_dequeue() thread exit abnormally.
“vi_capture_release: control failed, errno 1”. errno 1 means CAPTURE_ERROR_INVALID_PARAMETER, it also cause tegra_channel_kthread_capture_dequeue() thread exit abnormally.
maybe we can add some protect to prevent kernelpanic happen.
at tegra_channel_kthread_capture_dequeue() function,
err = tegra_channel_error_recover(chan, false);
if (err) {
dev_err(chan->vi->dev,
“fatal: error recovery failed\n”);
++chan->kthread_capture_dequeue = NULL;
break;
}
add “chan->kthread_capture_dequeue = NULL;”,and then when vi5_channel_stop_streaming() call kthread_stop, it will not stop the tegra_channel_kthread_capture_dequeue() thread that already not exist, maybe it can prevent refcount_t issue happen?
HI JerryChang :
I used the following patches: 0001-vi5-fix-vi5_channel_error_recover-memory-leak.patch +
0001-rtcpu-capture-ivc-fix-multi-cam-race-condition.patch +
add “chan->kthread_capture_dequeue = NULL;”
The panic issue reappeared, but the error stack trace was different:
[36933.472375] pc: tegra_channel_ec_close+0x1c/0x3c [tegra_camera]
[36933.472390] lr: vi5_power_off+0x44/0xa4 [tegra_camera]
[36933.472398] sp: ffff8000c6ac3b00
[36933.472399] x29: ffff8000c6ac3b00 x28: ffff0000c5ba7cc0 x27: 0000000000000000
[36933.472401] x26: ffff000092292108 x25: ffff0001e4310d00 x24: 0000000000000000
[36933.472403] x23: ffffb043f96cf208 x22: 000000000000000 x21: ffff000088eb00a0
[36933.472404] x20: ffff000086d7c010 x19: ffff000092292080 x18: 0000000000010000
[36933.472406] x17: 0000000000000001 x16: ffffb04440632834 x15: ffffffffffffffff
[36933.472408] x14: 000003ffffffffff x13: 0000000000000000 x12: 000000000000ffff
[36933.472411] x11: 00000000000003bf x10: 0000000000000000 x9: 0000000000000001
[36933.472412] x8: ffff8000c6ac3a30 x7: 0000000000000000 x6: ffff001f57d1aa78
[36933.472414] x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffff000092292720
[36933.472416] x2 : ffff000088eb03f8 x1 : ffffffffffffff8 x0 : ffffffff00000430
[36933.472418] Call trace:
[36933.472419] tegra_channel_ec_close+0x1c/0x3c [tegra_camera]
[36933.472427] tegra_channel_stop_streaming+0x38/0x50 [tegra_camera]
[36933.472435] __vb2_queue_cancel+0x2c/0x2b8 [videobuf2_common]
[36933.472441] vb2_core_streamoff+0x24/0xc0 [videobuf2_common]
[36933.472444] vb2_ioctl_streamoff+0x4c/0x90 [videobuf2_v4l2]
[36933.472449] v4l_streamoff+0x24/0x30 [videodev]
[36933.472444] vb2_ioctl_streamoff+0x4c/0x90 [videobuf2_v4l2]
[36933.472449] v4l_streamoff+0x24/0x30 [videodev]
[36933.472462] __video_do_ioctl+0x330/0x3fc [videodev] |
[36933.472470] video_usercopy+0x2d0/0x7fc [videodev]
[36933.472479] video_ioctl2+0x18/0x44 [videodev]
[36933.472502] invoke_syscall+0x48/0x114
[36933.472508] el0_svc_common.constprop.0+0x40/0xe0
[36933.472511] do_el0_svc+0x1c/0x28
[36933.472514] el0_svc+0x30/0xa8
[36933.472519] el0t_64_sync_handler+0x120/0x12c
[36933.472521] el0t_64_sync+0x194/0x198
[36933.472525] Code: eb00005f 54000140 d503201f 9110e020 (a9007c1f)
[36933.472527] —[ end trace 0000000000000000 ]—
[36933.475538] Kernel panic - not syncing: Oops: Fatal exception
Please help analyze the log again. Thank you.
0903_dma_null_race.zip (4.0 MB)
hello deeptalkcamera,
we may debug into vi5_power_off() to find the root cause,
for instance,
is it kernel panic reported by vi5_unit_get_device_handle()?
please add some debug prints, variable check..etc to reproduce the issue.
static void vi5_power_off(struct tegra_channel *chan)
{
...
vi5_unit_get_device_handle(vi->ndev, chan->port[0], &dev);
hi,JerryChang
i am in the same team with deeptalkcamera,we are try to fix the same problem.
as below log shows, vi driver try to error_recovery video2,but when vi_capture_release called, it meet “errno 1” problem, vi_capture_release failed.
at the same time, the app call tegra_channel_stop_streaming(), it also call vi_capture_release(),but driver already called vi_capture_release, so this time vi_capture_release return “setup channel first” error.
and then kthread_stop print calltrace info, finally kernel meet mem abort issue. the problem is vi_capture_release called two times by vi driver and app, maybe this cause the kernel panic。
seems it really a race condition problem between vi driver and app, your patch just add a semphore, what kind of race condition does the patch fix? i am not sure your patch can fix the race condition between vi driver and app.
2025-08-29T03:55:29.364779+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
2025-08-29T03:55:29.364808+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
2025-08-29T03:55:29.364816+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
2025-08-29T03:55:29.364818+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
2025-08-29T03:55:29.374195+08:00 tegra-ubuntu kernel: tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: control failed, errno 1
2025-08-29T03:55:29.374227+08:00 tegra-ubuntu kernel: tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: control failed, errno 1
2025-08-29T03:55:29.383623+08:00 tegra-ubuntu kernel: video4linux video3: vi capture release failed
2025-08-29T03:55:29.398125+08:00 tegra-ubuntu kernel: video4linux video2: vi capture release failed
2025-08-29T03:55:29.398132+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
2025-08-29T03:55:29.418724+08:00 tegra-ubuntu kernel: tegra-camrtc-capture-vi tegra-capture-vi: fatal: error recovery failed
2025-08-29T03:55:29.419640+08:00 tegra-ubuntu kernel: ox05b1s 3-001b: ox05b1s_stop_streaming
2025-08-29T03:55:29.419644+08:00 tegra-ubuntu kernel: max96724 3-0027: max96724_stop_streaming
2025-08-29T03:55:30.331819+08:00 tegra-ubuntu kernel: Failed to send CAN message, -105
2025-08-29T03:55:30.445808+08:00 tegra-ubuntu kernel: tegra194-vi5 8181200000.host1x:vi1@8188c00000: capture control message timed out
2025-08-29T03:55:30.445838+08:00 tegra-ubuntu kernel: tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_control_send_message: failed to send IVC control message
2025-08-29T03:55:30.445841+08:00 tegra-ubuntu kernel: tegra-nvcsi 8181200000.host1x:nvcsi@8188000000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
2025-08-29T03:55:30.465358+08:00 tegra-ubuntu kernel: tegra194-vi5 8181200000.host1x:vi1@8188c00000: vi_capture_release: setup channel first
2025-08-29T03:55:30.465390+08:00 tegra-ubuntu kernel: video4linux video2: vi capture release failed
2025-08-29T03:55:30.488627+08:00 tegra-ubuntu kernel: ------------[ cut here ]------------
2025-08-29T03:55:30.488647+08:00 tegra-ubuntu kernel: refcount_t: addition on 0; use-after-free.
2025-08-29T03:55:30.488650+08:00 tegra-ubuntu kernel: WARNING: CPU: 5 PID: 729581 at lib/refcount.c:25 refcount_warn_saturate+0x120/0x144
2025-08-29T03:55:30.488652+08:00 tegra-ubuntu kernel: Modules linked in: rfcomm nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nvidia_drm(O) nvidia_modeset(O) nvidia_uvm(O) qrtr bridge stp llc usb_f_ncm usb_f_mass_storage nvidia(O) usb_f_acm u_serial usb_f_rndis u_ether governor_pod_scaling(O) libcomposite algif_hash algif_skcipher af_alg bnep r8153_ecm cdc_ether usbnet r8152 snd_soc_tegra210_admaif snd_soc_tegra210_mixer snd_soc_tegra186_asrc snd_soc_tegra186_arad(O) snd_soc_tegra_pcm snd_soc_tegra210_sfc snd_soc_tegra210_ope snd_soc_tegra210_mvc snd_soc_tegra210_amx snd_soc_tegra210_adx snd_soc_tegra210_i2s snd_soc_tegra210_ahub tegra210_adma nvadsp(O) spidev nvvrs_pseq_rtc(O) rtk_btusb(O) btusb btrtl btintel btmtk btbcm bluetooth iwlmvm ecdh_generic ecc mac80211 onboard_usb_hub libarc4 crct10dif_ce sm3_ce coresight_trbe nvmap(O) sm3 iwlwifi sha3_ce tegra_capture_coe(O) sha512_ce cfg80211 sha512_arm64 coresight nvsciipc(O) ivc_cdev(O) ina238 can_raw ina3221 nv_ox05b1s(O) max96724(O)
2025-08-29T03:55:30.488662+08:00 tegra-ubuntu kernel: snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_simple_card_utils can rfkill max96717(O) arm_spe_pmu p008g_depth(O) p008g_rgb(O) cam_cdi_tsc(O) pps_tsync(O) tegra234_oc_event(O) nvpmodel_clk_cap(O) nv_hawk_owl(O) max96712(O) tegra23x_psc(O) tegra_cactmon_mc_all(O) tegra_aocluster(O) thermal_trip_event(O) tegra_aconnect lm90 snd_hda_codec_hdmi snd_soc_rt5640 nvethernet(O) snd_soc_rl6231 nvidia_vrs_pseq(O) at24 max96724_berxel(O) host1x_fence(O) snd_hda_tegra snd_hda_codec snd_hda_core mttcan(O) pwm_tegra_tachometer(O) nvpps(O) tegra264_mc_hwpm(O) nvidia_cspmu can_dev spi_tegra114 mc_t26x(O) ramoops tegra_dce(O) reed_solomon arm_cspmu_module nvhost_nvcsi(O) nvhost_pva(O) tegra_se(O) nvhost_capture(O) nvhost_vi5(O) tegra_se_kds(O) crypto_engine tpm_ftpm_tee camera_diagnostics(O) nvhost_isp5(O) tegra_capture_isp(O) tegra_camera(O) v4l2_dv_timings host1x_nvhost(O) tegra_drm(O) tegra_wmark(O) nvhwpm(O) drm_display_helper drm_dp_aux_bus cec drm_kms_helper host1x(O) tegra_camera_platform(O)
2025-08-29T03:55:30.488664+08:00 tegra-ubuntu kernel: mc_utils(O) capture_ivc(O) v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc camchar(O) rtcpu_debug(O) tegra_camera_rtcpu(O) ivc_bus(O) hsp_mailbox_client(O) nvme_fabrics fuse drm nfnetlink ip_tables x_tables ipv6 pwm_fan pwm_tegra tegra_bpmp_thermal tegra_xudc uas ucsi_ccg typec_ucsi typec nvme nvme_core phy_tegra194_p2u pcie_tegra194 ufs_tegra(O) pcie_tegra264(O)
2025-08-29T03:55:30.488667+08:00 tegra-ubuntu kernel: CPU: 5 PID: 729581 Comm: orion_server Tainted: G W O 6.8.12-tegra #1
2025-08-29T03:55:30.488669+08:00 tegra-ubuntu kernel: Hardware name: NVIDIA NVIDIA Jetson AGX Thor Developer Kit/Jetson, BIOS 38.2.0-gcid-41844464 08/22/2025
2025-08-29T03:55:30.488671+08:00 tegra-ubuntu kernel: pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=–)
2025-08-29T03:55:30.488673+08:00 tegra-ubuntu kernel: pc : refcount_warn_saturate+0x120/0x144
2025-08-29T03:55:30.488674+08:00 tegra-ubuntu kernel: lr : refcount_warn_saturate+0x120/0x144
2025-08-29T03:55:30.488676+08:00 tegra-ubuntu kernel: sp : ffff8000c73539d0
2025-08-29T03:55:30.488678+08:00 tegra-ubuntu kernel: x29: ffff8000c73539d0 x28: ffff0000cb3d8000 x27: 0000000000000009
2025-08-29T03:55:30.488679+08:00 tegra-ubuntu kernel: x26: 0000000000000000 x25: 0000ffff81fa6c50 x24: ffff0000cb3d8600
2025-08-29T03:55:30.488681+08:00 tegra-ubuntu kernel: x23: ffff0000877f9108 x22: ffff0000877f9080 x21: ffff0000877f9590
2025-08-29T03:55:30.488682+08:00 tegra-ubuntu kernel: x20: ffff0000bc0d5df0 x19: ffff0000bc0d5dc0 x18: 0000000002c5dc9f
2025-08-29T03:55:30.488684+08:00 tegra-ubuntu kernel: x17: 0000000000000000 x16: ffffce221d1710c8 x15: ffff8000c7353320
2025-08-29T03:55:30.488686+08:00 tegra-ubuntu kernel: x14: 000000000008f014 x13: 00000000ffffffea x12: ffffce221fd23e00
2025-08-29T03:55:30.488688+08:00 tegra-ubuntu kernel: x11: 0000000002c2d6e0 x10: 0000000002c2d6b0 x9 : 00000000000001e0
2025-08-29T03:55:30.488690+08:00 tegra-ubuntu kernel: x8 : ffffce221fccbd88 x7 : c00000010008d014 x6 : 00000000000006e0
2025-08-29T03:55:30.488691+08:00 tegra-ubuntu kernel: x5 : ffff001f57d77d08 x4 : 0000000000000000 x3 : ffff31fd38a03000
2025-08-29T03:55:30.488694+08:00 tegra-ubuntu kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000cb3d8000
2025-08-29T03:55:30.488695+08:00 tegra-ubuntu kernel: Call trace:
2025-08-29T03:55:30.488697+08:00 tegra-ubuntu kernel: refcount_warn_saturate+0x120/0x144
2025-08-29T03:55:30.488699+08:00 tegra-ubuntu kernel: kthread_stop+0x1b4/0x260
2025-08-29T03:55:30.488701+08:00 tegra-ubuntu kernel: vi5_channel_stop_kthreads+0x4c/0x68 [tegra_camera]
2025-08-29T03:55:30.488702+08:00 tegra-ubuntu kernel: vi5_channel_stop_streaming+0x138/0x13c [tegra_camera]
2025-08-29T03:55:30.488704+08:00 tegra-ubuntu kernel: tegra_channel_stop_streaming+0x28/0x50 [tegra_camera]
2025-08-29T03:55:30.488705+08:00 tegra-ubuntu kernel: __vb2_queue_cancel+0x2c/0x2b8 [videobuf2_common]
2025-08-29T03:55:30.488707+08:00 tegra-ubuntu kernel: vb2_core_queue_release+0x24/0x5c [videobuf2_common]
2025-08-29T03:55:30.488709+08:00 tegra-ubuntu kernel: _vb2_fop_release+0x88/0xbc [videobuf2_v4l2]
2025-08-29T03:55:30.488710+08:00 tegra-ubuntu kernel: tegra_channel_close+0x5c/0x138 [tegra_camera]
2025-08-29T03:55:30.488712+08:00 tegra-ubuntu kernel: v4l2_release+0xe4/0xec [videodev]
2025-08-29T03:55:30.488714+08:00 tegra-ubuntu kernel: __fput+0x78/0x2c4
2025-08-29T03:55:30.488715+08:00 tegra-ubuntu kernel: ____fput+0x10/0x1c
2025-08-29T03:55:30.488717+08:00 tegra-ubuntu kernel: task_work_run+0x74/0xd0
2025-08-29T03:55:30.488719+08:00 tegra-ubuntu kernel: do_exit+0x320/0x998
2025-08-29T03:55:30.488720+08:00 tegra-ubuntu kernel: do_group_exit+0x34/0x90
2025-08-29T03:55:30.488722+08:00 tegra-ubuntu kernel: copy_siginfo_to_user+0x0/0x164
2025-08-29T03:55:30.488724+08:00 tegra-ubuntu kernel: do_notify_resume+0x1d0/0x8d0
2025-08-29T03:55:30.488726+08:00 tegra-ubuntu kernel: el0_svc+0x98/0xa8
2025-08-29T03:55:30.488728+08:00 tegra-ubuntu kernel: el0t_64_sync_handler+0x120/0x12c
2025-08-29T03:55:30.488730+08:00 tegra-ubuntu kernel: el0t_64_sync+0x194/0x198
2025-08-29T03:55:30.488731+08:00 tegra-ubuntu kernel: —[ end trace 0000000000000000 ]—
2025-08-29T03:55:30.488733+08:00 tegra-ubuntu kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
2025-08-29T03:55:30.500111+08:00 tegra-ubuntu kernel: Mem abort info:
2025-08-29T03:55:30.500117+08:00 tegra-ubuntu kernel: ESR = 0x0000000096000004
2025-08-29T03:55:30.509184+08:00 tegra-ubuntu kernel: EC = 0x25: DABT (current EL), IL = 32 bits
2025-08-29T03:55:30.509191+08:00 tegra-ubuntu kernel: SET = 0, FnV = 0
2025-08-29T03:55:30.520362+08:00 tegra-ubuntu kernel: EA = 0, S1PTW = 0
2025-08-29T03:55:30.520366+08:00 tegra-ubuntu kernel: FSC = 0x04: level 0 translation fault
2025-08-29T03:55:30.520369+08:00 tegra-ubuntu kernel: Data abort info:
2025-08-29T03:55:30.528744+08:00 tegra-ubuntu kernel: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
2025-08-29T03:55:30.528751+08:00 tegra-ubuntu kernel: CM = 0, WnR = 0, TnD = 0, TagAccess = 0
2025-08-29T03:55:30.538870+08:00 tegra-ubuntu kernel: GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
2025-08-29T03:55:30.538876+08:00 tegra-ubuntu kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=00000002d478c000
2025-08-29T03:55:30.552140+08:00 tegra-ubuntu kernel: [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
2025-08-29T03:55:30.552146+08:00 tegra-ubuntu kernel: Internal error: Oops: 0000000096000004 [#1 ] PREEMPT SMPclient_loop: send disconnect: Broken pipe
error_recovery.zip (1.8 KB)
hi, Jerry
can you help review attach’s patch? we already apply dma memleak patch, and add a flag, to control the app not do stop streaming operation when driver error recovery logic meet some fatal error like “errno 1“ and “vi capture setup failed“.
we test about 13 hours, no refcount and kernel panic issue happen.
but i am not sure whether it is ok when app call stop streaming api, driver don’t let it do vi_power_off opertion? because error recovery logic also not do vi_power_off operation.
please help us review the patch. thanks.
hi all,
let me double check how you reproduce this issue? or, please share the pipeline for reference.
actually, we’ve also tested with two camera stream, it worked normally after overnight, no kernel-panic reported.
hi, Jerry
here is our reproduce method.
test with 4 camera.(2 camera in one channel)
plug-out one camera, and then plug-in, to let vi driver timeout, err recover fuction will reset the capture channel all the time.
open stream, after 15s, stop stream && close fd.
after about 10 hours, the err recover funtion will meet fatal error like “errno 1“ and “vi capture setup failed“, the dequeue thread will exit. and then it will meet kernel panic issue.
tegra_channel_kthread_capture_dequeue() thread meet “fatal: error recovery failed“ is the condition to reproduce kernel panic issue.
hello liutee,
this is the difference.. we don’t have this kind of test-case.
liuting11:
plug-out one camera, and then plug-in, to let vi driver timeout, err recover fuction will reset the capture channel all the time.
may I know what’s actual use-case?
there should be no errors if you plug-out a camera before stream-on.
besides.. error recover should not call all the time if you resume it, if yes.. tegra_channel may already crash.
hi, Jerry
the actual use-case just stream on, steam off test. plug-out the camera, just want to accelerate driver timeout to trigger err recover。
if plug-out a camera before stream-on, there is no error,but after that if stream on, kernel panic will happen at tegra_channel_kthread_capture_enqueue().
if stream-on, and then plug-out a camera, kernel panic happen very easy at tegra_channel_kthread_capture_enqueue().
we really don’t need to plug-out the camera, but if we plug-out the camera, kernel panic should not happen. maybe we need to add some protect logic when plug-out a camera?
by the way, if timed out happen, error recover will be called all the time, if reset channel successfully, it will not crash. but if reset channel failed(err no1, or vi capture setup failed), it will crash.