Tegra vi driver cause kernelpanic happen

hi,
our system have 4 camera, before test, we plug-out the 4 camera, and then user v4l2-ctl to start 4 stream, after than, kernel panic happen.

below is my reproduce method.

  1. plug-out the 4 camera.
  2. v4l2-ctl --stream-mmap -d /dev/front_left_camera &
    v4l2-ctl --stream-mmap -d /dev/front_right_camera &
    v4l2-ctl --stream-mmap -d /dev/side_left_camera &
    v4l2-ctl --stream-mmap -d /dev/side_right_camera &
  3. and then kernel panic happen.
    [ 59.438016] Unable to handle kernel paging request at virtual address ffffaabdbbc773b0
    [ 59.438035] Mem abort info:
    [ 59.438042] ESR = 0x0000000096000046
    [ 59.438056] Unable to handle kernel paging request at virtual address 0018fc007cfb10c8
    [ 59.438380] EC = 0x25: DABT (current EL), IL = 32 bits
    [ 59.446423] Mem abort info:
    [ 59.451651] SET = 0, FnV = 0
    [ 59.451652] EA = 0, S1PTW = 0
    [ 59.451653] FSC = 0x06: level 2 translation fault
    [ 59.451654] Data abort info:
    [ 59.451654] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
    [ 59.451659] [ffffaabdbbc773b0] pgd=1000001fd7fff003pgdp=0000000fbae60000
    [ 59.454447] ESR = 0x0000000096000004
    [ 59.457587] , p4d=1000001fd7fff003, pud=1000001fd7ffe003, pmd=0000000000000000
    [ 59.457591] Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
    [ 59.457594] Modules linked in: rfcomm xt_conntrack xt_addrtype nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nvidia_drm(O) nvidia_modeset(O) nvidia_uvm(O) qrtr bridge stp llc usb_f_ncm usb_f_mass_storage nvidia(O) usb_f_acm u_serial governor_pod_scaling(O) usb_f_rndis u_ether libcomposite algif_hash algif_skcipher af_alg bnep r8153_ecm cdc_ether usbnet r8152 snd_soc_tegra210_mixer snd_soc_tegra186_arad(O) snd_soc_tegra210_sfc snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_adx snd_soc_tegra_pcm snd_soc_tegra210_i2s snd_soc_tegra210_amx snd_soc_tegra210_ahub tegra210_adma nvadsp(O) spidev nvvrs_pseq_rtc(O) rtk_btusb(O) tegra_capture_coe(O) onboard_usb_hub btusb btrtl btintel btmtk iwlmvm btbcm bluetooth mac80211 ecdh_generic nv_ox05b1s(O) p008g_depth(O) p008g_rgb(O) libarc4 ecc crct10dif_ce sm3_ce sm3 nvmap(O) sha3_ce sha512_ce sha512_arm64 iwlwifi coresight_trbe nvsciipc(O) coresight ivc_cdev(O)
    [ 59.457643] ina3221 ina238 snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card cfg80211 can_raw can snd_soc_simple_card_utils arm_spe_pmu nvpmodel_clk_cap(O) tegra234_oc_event(O) pps_tsync(O) cam_cdi_tsc(O) rfkill nv_hawk_owl(O) thermal_trip_event(O) max96712(O) tegra_cactmon_mc_all(O) tegra23x_psc(O) tegra_aocluster(O) nvethernet(O) tegra_aconnect snd_hda_codec_hdmi max96724(O) snd_soc_rt5640 max96717(O) snd_soc_rl6231 at24 mttcan(O) snd_hda_tegra lm90 snd_hda_codec host1x_fence(O) snd_hda_core nvpps(O) nvidia_vrs_pseq(O) can_dev tegra264_mc_hwpm(O) pwm_tegra_tachometer(O) nvidia_cspmu spi_tegra114 ramoops tegra_dce(O) mc_t26x(O) reed_solomon arm_cspmu_module nvhost_nvcsi(O) tegra_se(O) nvhost_pva(O) tegra_se_kds(O) nvhost_vi5(O) nvhost_capture(O) crypto_engine tpm_ftpm_tee camera_diagnostics(O) nvhost_isp5(O) tegra_capture_isp(O) tegra_camera(OE) v4l2_dv_timings host1x_nvhost(O) tegra_drm(O) tegra_wmark(O) nvhwpm(O) drm_display_helper drm_dp_aux_bus cec drm_kms_helper host1x(O) tegra_camera_platform(O)
    [ 59.457679] mc_utils(O) capture_ivc(O) v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc camchar(O) rtcpu_debug(O) tegra_camera_rtcpu(O) ivc_bus(O) hsp_mailbox_client(O) nvme_fabrics fuse drm nfnetlink ip_tables x_tables ipv6 pwm_fan pwm_tegra tegra_bpmp_thermal tegra_xudc uas ucsi_ccg typec_ucsi typec nvme nvme_core phy_tegra194_p2u pcie_tegra194 ufs_tegra(O) pcie_tegra264(O)
    [ 59.457700] CPU: 5 PID: 4508 Comm: vi-output, ox05 Tainted: G W OE 6.8.12-tegra #1
    [ 59.457703] pstate: a34000c9 (NzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=–)on, BIOS edk2-7efbae24 09/03/2025
    [ 59.457705] pc : queued_spin_lock_slowpath+0x370/0x470
    [ 59.457716] sp : ffff8000a3323cd0rqsave+0x78/0x8c
    [ 59.457717] x29: ffff8000a3323cd0 x28: ffff000088110680 x27: ffff0000f7997400
    [ 59.457719] x26: 0000000000000000 x25: 0000000000001440 x24: 0000000000000005
    [ 59.457721] x23: 0000000000000000 x22: ffffaabdbbc62008 x21: ffff001f57d86380
    [ 59.457722] x20: ffffaabdbbc77380 x19: ffff00008811081c x18: 0000000000000000
    [ 59.457724] x17: 0000000000000000 x16: ffffaabdbad72df8 x15: 0000000000000000
    [ 59.457725] x14: 0000000000000000 x13: 0000000000000037 x12: 0000000000000001
    [ 59.457727] x11: 0000000000000000 x10: f7d53bf894f652cd x9 : 1580f04d925e5333
    [ 59.457729] x8 : ffff0001228cca98 x7 : ffff001f57d85640 x6 : ffffaabdbc59cbe8
    [ 59.457731] x5 : 0000000000180000 x4 : 0000000000000000 x3 : ffffaabdbbc773b0
    [ 59.457732] x2 : 0000000000000000 x1 : ffff001f57d86380 x0 : ffff001f57d86388
    [ 59.457734] Call trace:
    [ 59.457737] _raw_spin_lock_irqsave+0x78/0x8c0x470
    [ 59.457739] tegra_channel_kthread_capture_enqueue+0xa4/0x580 [tegra_camera]
    [ 59.457754] kthread+0x110/0x114
    [ 59.457764] Code: d37c0403 91002020 8b030283 f864d8c4 (f8246861)
    [ 59.457766] —[ end trace 0000000000000000 ]—
    [ 61.179539] SMP: failed to stop secondary CPUs 8tal exception
    [ 61.179550] Kernel Offset: 0x2abd39a40000 from 0xffff800080000000
    [ 61.184788] Memory Limit: none00000000,d003cd4b,27fffe67
    [ 61.190837] —[ end Kernel panic - not syncing: Oops: Fatal exception ]—

kernelpanic_afterplugoutcamera.txt (266.0 KB)

i think maybe tegra_channel_error_recover() lack of protection, after remove tegra_channel_error_recover() function, all plug-in/out related kernel panic disappear.
issue 1: plug-out camera, stream on, kernel panic happen
issue 2: stream on, plug-out camera, kernel panic happen
is it possible to remove tegra_channel_error_recover() logic? or can you help to add some protection when tegra_channel_error_recover() need to execute.

hello liutee,

just an FYI,
we’re able to reproduce kernel panic when accessing non-existent camera on developer kit.

good news, look forward to fixing this issue.thanks

hello liutee,

this is kernel warning instead a kernel panic.
please give it a try to apply b9d7c97.diff (1.3 KB) to avoid this,
please have confirmation.

hello jerry
I will try the patch. thanks for your help

Hi @liuting11

Could you share the verification result with the provided patch? Thanks

hi, our current solution is remove the recovery logic. we test for serveral weeks, it is stable, both case 344342 and 343365 are ok now.
tomorrow we will test your patch, and recover “the vi recovery logic”. please wait for a moment. thanks

hi, Jerry
what kind of kernel warning does the log print? now plug in/out the camaera, no kernel panic happen. but also no kernel warning happen. is this normal?

hello liutee,

you may dig into canonical kernel sources of videobuf2-core.c about the kernel warning.
for instance,
please check kernel API, vb2_start_streaming().
it’s reported by WARN_ON(!list_empty(&q->done_list)); which means if done_list is not empty, then start_streaming() didn’t call vb2_buffer_done(vb, VB2_BUF_STATE_QUEUED) but STATE_ERROR or STATE_DONE.
hence, kernel warning is seen because the vb2_buffer_done() is called with VB2_BUF_STATE_ERROR after timeout. the patch (in post #8) has introduced to set buffer state accordingly.

hi Jerry
I checked the patch, when timeout happen, it will not do recovery logic,right?
from the test result, the patch worked, it will not kernel panic when plug out.

hello liutee,

that’s correct, it is considered capture has timed out when accessing to non-existent camera.
so, the error recovery logic will not be executed.

ok, the patch worked, you can close this topic. thanks