Vi5 error: Unable to handle kernel NULL pointer dereference at virtual address 00000000

Hi NV,

When the GMSL interface is not connected to any camera, execute the following test script:

for id in `seq 0 $((num-1))`;do
    gst-launch-1.0 -v nvv4l2camerasrc device=/dev/video$id ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1536' !  tee name=t \
            t. ! queue ! nvvidconv ! video/x-raw, width=1920, height=1536, format=BGRx ! videoconvert ! fpsdisplaysink video-sink=xvimagesink 
done

Phenomenon:
The desktop UI is stuck, the cursor can be moved, but no operations can be performed.

dmesg:
dmesg.log (171.1 KB)

release version info:

R32 (release), REVISION: 5.0, GCID: 25531747, BOARD: t186ref, EABI: aarch64, DATE: Fri Jan 15 23:21:05 UTC 2021

hello wangxiaojunfuture,

re-cap failure as below.
it looks there’s no available frames.

[1034242.606836] mv-max9296 2-0004: start streaming...
[1034245.356964] tegra194-vi5 15c10000.vi: no reply from camera processor
[1034245.357219] tegra194-vi5 15c10000.vi: uncorr_err: request timed out after 2500 ms
[1034245.357534] tegra194-vi5 15c10000.vi: err_rec: attempting to reset the capture channel

BTW,
this looks liked old release version, is it possible moving to latest release version for testing.

Hi Jerry,

The purpose of the test is to obtain the actual effect of the video stream when no camera is connected.
Even if the camera is not connected, the system desktop should not be stuck or a null pointer error should occur in the kernel.

In addition, our products will not upgrade the JP version.
Is there a patch for this problem?

hello future.wang,

camera software stack it doesn’t support such scenario.
so, you’ll see err_rec: attempting to reset the capture channel to re-try the capture request.
even though you don’t have camera connected, is it possible to enable TPG (test-pattern-generator) on the SerDes chip for sending test pattern to the CSI brick?

anyways,
please refer to Topic 187824 to update kernel sources to fix NULL pointer dereference, thanks

Hi Jerry,

Using the same testing method, the following version still does not solve this problem,

R35 (release), REVISION: 3.1, GCID: 32827747, BOARD: t186ref, EABI: aarch64, DATE: Sun Mar 19 15:19:21 UTC 2023

kernel log:
Xavier_JP5.1.1_KernelPanic_autoReboot.txt (396.0 KB)

hello future.wang,

your device still not sending anything to the CSI brick, right?
is it possible to enable TPG for validation?

Hi Jerry,

The purpose of the test is to obtain the actual effect of the video stream when no camera is connected.
The JP5.1.1 version not only stuck on the desktop UI, but also caused a kernel panic.

hello future.wang,

FYI, we’ve based-on JP-5.1.2 to pause the sensor stream for testing.
it’s confirmed (we’ve stalled the camera sensor stream for 1-hour) we cannot reproduce kernel panic on developer kits.

hence,
please try moving to the latest release version if that’s possible.

Hi Jerry,

I used the patch in the following topic to solve the problem of null pointers and system restart.

However, when we tested for a long time, we found that the memory was slowly leaking. Is there a patch to solve it?

hello future.wang,

did you confirm memory leakage leakage is due to this patch?
did it repo OOM? how long does it take?

Hi Jerry,

Kernel panic log:

[48864.057925] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[48864.058356] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[48864.059420] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.059736] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=0, csi_port=0
[48864.060164] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.060486] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 0 vc- 0
[48864.061333] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[48864.313950] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[48864.314370] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[48864.315374] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.315747] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=4, csi_port=4
[48864.316239] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.316553] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 4 vc- 1
[48864.317462] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[48864.569922] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[48864.570313] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[48864.571670] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.572003] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=2, csi_port=2
[48864.572414] (NULL device *): vi_capture_control_message: NULL VI channel received
[48864.572744] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_open: VI channel not found for stream- 2 vc- 1
[48864.597677] tegra194-vi5 15c10000.vi: vi_capture_setup: memoryinfo ringbuffer alloc failed
[48864.598194] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[48864.600099] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[48864.600536] Mem abort info:
[48864.600716]   ESR = 0x96000044
[48864.600865]   EC = 0x25: DABT (current EL), IL = 32 bits
[48864.601130]   SET = 0, FnV = 0
[48864.601266]   EA = 0, S1PTW = 0
[48864.601400] Data abort info:
[48864.601540]   ISV = 0, ISS = 0x00000044
[48864.601713]   CM = 0, WnR = 1
[48864.601960] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000116db4000
[48864.602295] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[48864.602632] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[48864.605756] Modules linked in: xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) br_netfilter(E) fuse(E) lzo_rle(E) lzo_compress)
[48864.605946]  usbserial(E) snd_soc_rt5659(E) snd_soc_rl6231(E) snd_soc_tegra210_adsp(E) snd_soc_tegra_utils(E) snd_hda_tegra(E) max77620_thermal(E) snd_hda_codec(E) snd_hda_core(E) snd_soc_simple_card_utils(E) nct1008(E) tegra_bpmp_thermal(E) nvadsp(E) snd_soc_tegra210_ah]
[48864.730464] CPU: 0 PID: 112312 Comm: vi-output, mv-m Tainted: G           OE     5.10.104-tegra #1
[48864.739626] Hardware name: Unknown Jetson-AGX/Jetson-AGX, BIOS 3.1-32827747 03/19/2023
[48864.747764] pstate: 40c00009 (nZcv daif +PAN +UAO -TCO BTYPE=--)
[48864.753811] pc : tegra_channel_kthread_capture_enqueue+0x1e4/0x4b0
[48864.760026] lr : tegra_channel_kthread_capture_enqueue+0x1e0/0x4b0
[48864.766308] sp : ffff800031123d30
[48864.769720] x29: ffff800031123d30 x28: 0000000000000000 
[48864.775232] x27: 0000000000000001 x26: ffff70bf15c84080 
[48864.780744] x25: 0000000000000600 x24: 0000000000000f00 
[48864.786257] x23: ffff800031123e28 x22: ffff70bf15c849fc 
[48864.791769] x21: ffff70bf15c84ac0 x20: 0000000000000000 
[48864.797281] x19: ffff70bc37490400 x18: 0000007fff800000 
[48864.802796] x17: 0000000000000000 x16: ffffc2c4f5142d90 
[48864.808134] x15: 000000000000001e x14: 0000000000000000 
[48864.813646] x13: 0000000000000000 x12: 0000000000000000 
[48864.818984] x11: 0000000000000000 x10: 0000000000000000 
[48864.824493] x9 : 0000000000000000 x8 : 0000000000000000 
[48864.829919] x7 : 0000000000000000 x6 : ffff8000b2ebb180 
[48864.835345] x5 : 000000004e0f0040 x4 : 0000000000000000 
[48864.840770] x3 : 0000000000000000 x2 : ffffffffffffffc0 
[48864.846107] x1 : ffffc2c4f6492cc0 x0 : ffff8000b2ebb000 
[48864.851446] Call trace:
[48864.853908]  tegra_channel_kthread_capture_enqueue+0x1e4/0x4b0
[48864.859763]  kthread+0x148/0x170
[48864.862999]  ret_from_fork+0x10/0x18
[48864.866694] Code: a9082fe9 f9004bec 97e55a15 a9482fe9 (a9007f9f) 
[48864.872807] ---[ end trace 5d7f98506147ba3f ]---
[48864.889687] Kernel panic - not syncing: Oops: Fatal exception
[48864.889949] SMP: stopping secondary CPUs
[48864.890128] Kernel Offset: 0x42c4e5120000 from 0xffff800010000000
[48864.892837] PHYS_OFFSET: 0xffff8f4540000000
[48864.896862] CPU features: 0x8240002,03802a30
[48864.901143] Memory Limit: none
[48864.911758] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
��
[0000.054] W> RATCHET: MB1 binary ratchet value 4 is larger than ratchet level 1 from HW fuses.
[0000.062] I> MB1 (prd-version: 2.6.0.0-t194-41334769-cab45716)
[0000.067] I> Boot-mode: Coldboot

hello future.wang,

here’s allocation failure.

may I know which Jetpack release version you’re now using?
is it possible for moving to the latest release, i.e. JP-5.1.2/l4t-r35.4.1 for confirmation?

Hi Jerry,

JP5.1.1.
Currently we have no plans to upgrade our products to the JP 5.1.2 version

Hi Jerry,

Recurring log:

[41970.051373] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: successfully reset the capture channel
[41970.047831] tegra-camrtc-capture-vi tegra-capture-vi: uncorr_err: request timed out after 2500 ms
[41970.048244] tegra-camrtc-capture-vi tegra-capture-vi: err_rec: attempting to reset the capture channel
[41970.049274] (NULL device *): vi_capture_control_message: NULL VI channel received
[41970.049594] t194-nvcsi 13e10000.host1x:nvcsi@15a00000: csi5_stream_close: Error in closing stream_id=5, csi_port=6[41970.064916] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[41970.073521] Mem abort in970.050033] (NULL device *): vi_capture_control_message: NULL VI channel received
vc- 00.050374] t194-n970i 14244] ESR 1x:nvcx9600004400: cs970.08i5_str9] _opC = 0x25: DAannBT (cu font EL) for stre bits
[41970.051021] tegra194-vi5 15c10[41000.09: v685apt S_setup, Femoryi= 0 ringbuffer alloc failed
[41970.051373] teg970.103839] EA =re-vi S1PTW = capture-v970.10err_rec: successfubort iset:
e capture ch970el151
000000000000000970.121298] use1] gta abort: 4k pages, 48-970bit VA4244] ESR 000117000000

970.087709] E[41C =.1299325:[00BT 000000nt EL), I pg 32 bi000000
000000.096859] S000= 00000
Data a970.14343:4] Mod970s linked in:SV conntSS k(Ex00t_M044UERADE(E) nf_co0] rack_n 0, WnR = 1 nfnet970k(E) fuse(E)8] usedrtgtable ipk pagefil48-ter(E)s, pgdtable_000000 nf11772f000) nf_c970track(E) 9] [00rag_ipv6(000000def pgd=000000000000000032c(E)000000tfilte0000) lzo)
c(E970nd_soc3648] 210_adx(Etech_dsoc_tegra210_i2gra210_adsp(ce_snd(E)c_typto_simd(E) cr) sd(Esoces_ce_cipherard_utash_ce(E) nvadsp(E) lsha256 tegra(E)adma(E) ce(77620_theleds(E) sct1soc_spdifd_s(E)tegd_soc_tegahumachine_driva_bE) mttermal(E) d_soc_a_c659(E)c_hdmiev( snd_hd]
) s[41nd
.27_simple_c: 0 PIils609 nvadsp(E) l: vi-output, mvgra210ntema(G max776 thermal(E) n10.104(E) sn #1
oc_tegra210_.28ahub(EHaregrre name: Unknowal(etssndAGXa_coden-AGX,(E) snd_hda_-32827E) snd/19a_codec(E) userspace_ale500E) snd_hdate: 40c0000c(E) spi_v daif14(AN +UAO -TCO BTYPEsc()
ina3221(E) pwm_pc : t(E)egrgpuhan nvmap(E)ad__tature_enqueue+0x) x/0x4b0s(E) [last u.30854d: lr : tegra_c970.27_kt5832] captur PID: 609eueomm1e0i-o4b0ut, mv970Tai5107] sp G 80002eca3
on-AGX/Jetson-AGX,4] x27: 0000000-32000747 x26: fff2330d00000000
.29500[41.33ate6] x25000nZcv d000 +P600+UAO -000000YPE000)0
/0x4b0970.34970.30x218546] fff: tegr0d5ccfnel_kthre000000000tur000nqueue+0x1e0/0x4b07919] [41970.31fff7] sp : fc00 x18: ca30007ff000000.318517] 970: ffff80002ex17: 0 x20008: 000000000 x16: 0000000009700000
x27970: 0000000000x150010006: 000f1930d5cc4: 0000000000000000
.339706] 568: 0000000000000000000000000000 x10000f0000000000000970
5855] x23.3717180002eca3x11 x22: fff00030d5ccf9fc 000000970.341880]
: ffff970.38de17] x7 x18: 0000007ff0000000 ffff8000b9707281187] x17: 09700009741] 000: 0 x16: 04e0f0040000 : 0000 0000000000 x21970fff1938020] ac0: 0000000000000000 x8 : 000000970.347919] 0
.359705] x15525000000: 0000000000 x14: 000 : 000fffffffffffc0
.369702] x13095x1 000: f000000000 x1cc0000000fff00007281000
970[41.40636x119] : 0000000000000000970.40000000 te000gra0000 el_kth970d_c8020] x9 queue+000e4/0x4000 x8 : 000.415561] 000hread+0x148/970.384057] x7 : 00009234] 000000000 x6rk+0x10/000b
1180 970.429708] Code: x5 82fe9 00004b04e0f004055a11 a94 : e9 (a9000f9f)
0 970.429565] —.395257] x3 : 0000ffdf1d000000 x20 ]—fffffffffffc0
[41970.400957] x1 : ffffdfa2d6362cc0 x0 : ffff8000b7281000
[41970.406369] Call trace:
[41970.409274] tegra_chann[41el_kth.44576d_cKernel_ennicue+0x1e4/0x4ncing: Oo970.415561] kthre exad+tio48/0x1970703095] 970SMP: stop4] ret_fecondary CPU0x10/0x18[41970.457998] Kernel Offset: 0x5fa2c4ff0000 from 0xffff800010000000
[41970.464027] PHYS_OFFSET: 0xffffe6d380000000
[41970.468059] CPU features: 0x8240002,03802a30
[41970.472164] Memory Limit: none
[41970.484295] —[ end Kernel panic - not syncing: Oops: Fatal exception ]—
��ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”
ERROR: app/ivc-sync-in2-channel.c:84 [ivc_sync_in2_write_msg] “tegra_ivc_tx_get_contiguous_write_space() failed”

hello future.wang,

it looks your sensor has not register to linux system correctly.
please refer to developer guide, Debugging Tips section to review your driver implementation.

Hi Jerry

We are simulating the scene when the camera module is abnormally disconnected,
so we did not connect the camera module during the test.

We expect that even if the camera module is abnormally disconnected or not connected, it should not cause the system to reboot.

as mentioned in previous comment #11, we cannot reproduce kernel panic on developer kits with the latest release version.

Hi Jerry,

I located around here, but didn’t find the final solution:

XavierII# /opt/linaro/gcc-linaro-9.3.0-2020.08-1-aarch64-glibc-stable-final/bin/aarch64-buildroot-linux-gnu-addr2line -e out/KERNEL/vmlinux -f -C ffff800010ce5194

tegra_channel_kthread_capture_enqueue

/XavierII/sources/kernel/nvidia/drivers/media/platform/tegra/camera/vi/vi5_fops.c:690

hello future.wang,

is it a race condition? for example, is it always repo’ed with multi-camera scenario?
please give it a try with single video node only for confirmation.

Hi Jerry,

I use the following test script, the parameter is 8.
It will definitely reappear within 12-40 hours

#!/bin/bash
num=$1

for id in `seq 0 $((num-1))`;do
    gst-launch-1.0 -v nvv4l2camerasrc device=/dev/video$id ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)1920,                  height=(int)1536' !  tee name=t \
            t. ! queue ! nvvidconv ! video/x-raw, width=1920, height=1536, format=BGRx ! videoconvert ! fpsdisplaysink video-         sink=xvimagesink \
    >video$id.log 2>&1 &
    PIDS+="$! "
done
echo $PIDS
wait $PIDS