Crash of nvargus NVhost_sync

Hello,
We have an application on Jetson Nano, Ubuntu 18.04, L4T 32.4.4 [JP 4.4.1].
Where we are recording and processing stream from three cameras. ( IMX477 (4K@30FPS) and 2x OV9782 (HD@10FPS)). The cameras are synchronize by external trigger. We separated one core of Jetson Nano just for nvargus.

Most of the time pipline works as expected, but once in while it crashes (approx. once per
30 min). When we removed on camera from the stream (So just 1x IMX477 and 1x OV9782), the stream was more stable but we still saw sometimes crash (approx. once in an hour).

after crash we can see following errors in the dmesg:

[ 1075.254807] fence timeout on [ffffffc0dedad240] after 1500ms
[ 1075.254812] name=[nvhost_sync:53], current value=26435 waiting value=26442
[ 1075.254815] ---- mlocks ----

[ 1075.254825] ---- syncpts ----
[ 1075.254833] id 7 (54340000.vic_0) min 740097 max 740097 refs 1 (previous client : 54340000.vic_0)
[ 1075.254838] id 8 (544c0000.nvenc_0) min 87940 max 87940 refs 1 (previous client : 544c0000.nvenc_0)
[ 1075.254841] id 9 (54600000.isp_0) min 36250 max 36250 refs 1 (previous client : 54600000.isp_0)
...
...
               ---- host syncpt thresh ----

[ 1073.718080] syncpt_int_thresh_thresh_0(0) = 1
[ 1073.718100] syncpt_int_thresh_thresh_0(36) = 18304
[ 1073.718103] syncpt_int_thresh_thresh_0(38) = 275
[ 1073.718106] syncpt_int_thresh_thresh_0(39) = 9143
[ 1073.718115] syncpt_int_thresh_thresh_0(51) = 52858
[ 1073.718118] syncpt_int_thresh_thresh_0(53) = 26436
[ 1073.718121] syncpt_int_thresh_thresh_0(54) = 26436
[ 1073.718788] fence timeout on [ffffffc0cb695900] after 1500ms
[ 1073.718793] name=[nvhost_sync:54], current value=26435 waiting value=26440
[ 1073.718797] ---- mlocks ----

[ 1073.718808] ---- syncpts ----
[ 1073.718817] id 7 (54340000.vic_0) min 739821 max 739821 refs 1 (previous client : 54340000.vic_0)
[ 1073.718821] id 8 (544c0000.nvenc_0) min 87848 max 87848 refs 1 (previous client : 544c0000.nvenc_0)
[ 1073.718824] id 9 (54600000.isp_0) min 36250 max 36250 refs 1 (previous client : 54600000.isp_0)
[ 1073.718828] id 11 (54600000.isp_1) min 36250 max 36250 refs 1 (previous client : 54600000.isp_1)
[ 1073.718832] id 12 (54600000.isp_2) min 139466 max 139466 refs 1 (previous client : 54600000.isp_2)
...

Another time it was:

[  322.322680] name=[nvhost_sync:36], current value=274 waiting value=277
[  322.322685] ---- mlocks ----

[  322.322697] ---- syncpts ----
[  322.322708] id 7 (54340000.vic_0) min 41853 max 41853 refs 1 (previous client : 54340000.vic_0)
[  322.322713] id 8 (544c0000.nvenc_0) min 9090 max 9090 refs 1 (previous client : 544c0000.nvenc_0)
[  322.322717] id 9 (54600000.isp_0) min 119 max 119 refs 1 (previous client : 54600000.isp_0)
[  322.322721] id 11 (54600000.isp_1) min 119 max 119 refs 1 (previous client : 54600000.isp_1)
[  322.322725] id 12 (54600000.isp_2) min 490 max 490 refs 1 (previous client : 54600000.isp_2)

My first question is what does numbers on following line means? I got nvhost_sync 54,36,39,53,… is it line of code in nvargus lib? The current value is id of frame?
name=[nvhost_sync:54], current value=26435 waiting value=26440

Another question, is there way how to prevent it? Larger reliability with just two cameras make me believe that the issue is related to CPU load, but I monitored CPU and usage didnt went above 4x85% (each core). I know that this was discussed in several threads, and you incresed size of nvargus buffer and it helped, but for that I need recompiled lib with modified size of a buffer (I will try to look for topic where I saw this solution).

Regarding device tree, yes we tried switching discontinuous_clk.
And yes we validated the sensor driver Verifying the V4L2 Sensor Driver
only fail was on the line test VIDIOC_G/S_PARM: FAIL which I believe is accepteble.

hello NucleoIris,

this is very old release version,
is it possible for moving forward to the latest rel-32, such as linux-tegra-r3275 for verification.

Hello JerryChang,
thanks for the reply, unfortunately no upgrade to higher version of JP is not currently possible, we tried that but system wasn’t stable (with all devices which we need).

In attached file I’m providing longer version of crash report.
dmesg_stream_crash.txt (70.0 KB)

hello NucleoIris,

may I also know what’s your test pipeline?
is this related to CPU assignment? are you able to reproduce the same by default settings?

besides, here’s patch for nvarguscamerasrc of fixing memory leakage, please see-also… Topic 160811.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.