Tegra_nvcsi_stream_tpg_disable suddenly happens with high CPU/GPU load

Hi there,

I’m testing 2048x1280 raw gmsl2 camera on our customized AGX Orin product with L4T 35.4.1 (MODE 30W)
The MIPI DPHY lane speed is configured to 1.5 Gbps, and the result is the same even if we configure lane speed to 1.2Gbps. Therefore, I think it’s not related to deskew.

In out test, we use gst-launch to streaming 6 cameras and using some scripts to generate high CPU and GPU load at the same time. After few minutes, the videos disappear. The dmesg, gstreamer error log, and tegrastats log are listed below.

Please help to find the root cause, thank you.

gst_err.txt (3.1 KB)
dmesg_err.txt (25.0 KB)
log_tegrastats2.txt (509.3 KB)

The gst-launch cmd:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)120, format=(string)NV12, framerate=(fraction)30/1' ! nvvidconv ! fpsdisplaysink video-sink=xvimagesink -e

hello ting.chang,

is it only happened with CPU/GPU stressed?
may I know what’s the real use-case, we usually not support such simulation method to stress processor.

No, it’s not related to CPU/GPU stressed.
The same problem happens even if we don’t use any stress. But it seems to happen quickly under stressed.

This issue happens in L4T 35.4.1.
If we switch back our previous BSP which is L4T 35.1, there is no problem.

So I wonder what’s the difference between 35.4 and 35.1?

hello ting.chang,

we’ve recently address some camera bugs for JP-5.1.2/l4t-r35.4.1
please see-also below threads to apply the bug fixes for verification.
for example,
Topic 268833, for camera firmware to update deskew algorithm, and also stability fixes,
Topic 268519, to fix memory corruption within libnvargus

there’re some bug fixes and stability improvement between r35.4.1 and r35.1
you may see-also release notes, such as Release Notes (r35.4.1), or Release Notes (35.3.1) as see-also.

Hi Jerry,

I can’t extract the Topic268519_Oct06.7z file on Windows or Ubuntu. Could you compress libnvargus.so and upload it again? Thank you.

hello ting.chang,

I’ve compress the binary file to create another attachment,
pease try again with… Topic269805_Oct19.zip (434.2 KB)

1 Like

Thanks Jerry,

I’ve updated our BSP with the provided RCE firmware and libnvargus.so
So far our test without CPU/GPU stress are good, the cameras keep runnning over 14 hours.

hello ting.chang,

besides,
why you need to stress the CPU/GPU, may I know what’s the real use-case?

Our customers in automotive industry will deploy some AI models into the product. And the AI models may take a lot of CPU/GPU computing power.

So we just want to make sure the cameras can be run smoothly under high stress.

Hi @JerryChang,

Can I know how to verify the RCE firmware has been already updated in our system?
Only tegra186-cam-rtcpu bc00000.rtcpu: frimware version cpu=rce cmd=6 sha1=73154acbae92fa621c3d5e1b2a6e10f1075059aa can be seen from dmesg. Is it also applied for Jetson Orin?

hello ting.chang,

yes, it looks like you’ve update camera firmware correctly.

1 Like

Hi Jerry,

If I test 6 cameras with high GPU stress, sometimes we will get below error:

Dmesg:

~
[410065.599078] cpufreq: cpu4,cur:1382000,set:1497600,set ndiv:117
[410167.668552] cpufreq: cpu4,cur:1616000,set:1497600,set ndiv:117
[410234.714325] cpufreq: cpu4,cur:1627000,set:1497600,set ndiv:117
[410295.749579] cpufreq: cpu0,cur:1619000,set:1497600,set ndiv:117
[410307.322463] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=410342810962688 sof_ts=410342844055200 gerror_code=2 gerror_data=a0 notify_bits=0"
[410307.322471] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=410342813896640 sof_ts=410342846914784 gerror_code=2 gerror_data=a4 notify_bits=0"
[410307.322474] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=410342816411904 sof_ts=410342849454656 gerror_code=2 gerror_data=a5 notify_bits=0"
[410307.322476] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=410342816960768 sof_ts=410342850300800 gerror_code=2 gerror_data=a4 notify_bits=0"
[410307.322478] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=410342819763840 sof_ts=410342853103808 gerror_code=2 gerror_data=a0 notify_bits=0"
[410307.747378] bwmgr API not supported
[410307.751493] bwmgr API not supported
[410308.761880] cpufreq: cpu4,cur:1376000,set:1497600,set ndiv:117
[410309.842839] bwmgr API not supported
[410309.843000] bwmgr API not supported
[410309.847087] bwmgr API not supported
[410309.848611] max9296 32-0048: stop streaming
[410309.854452] bwmgr API not supported
[410309.857188] bwmgr API not supported
[410309.862096] max9296 30-0048: stop streaming
[410309.868179] bwmgr API not supported
[410309.869362] max9296 33-0048: stop streaming
[410309.875773] bwmgr API not supported
[410309.876903] max9296 32-0048: stop streaming
[410309.884139] bwmgr API not supported
[410309.885511] max9296 30-0048: stop streaming
[410309.893182] bwmgr API not supported
~

Gstreamer error:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)120, format=(string)NV12, framerate=(fraction)30/1' ! nvvidconv ! queue ! fpsdisplaysink video-sink=xvimagesink -e
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 2048 x 1280 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 8.000000; Exposure Range min 49000, max 33333000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 0 
   Output Stream W = 2048 H = 1280 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
CONSUMER: ERROR OCCURRED
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: TIMEOUT
Additional debug info:
Argus Error Status
EOS on shutdown enabled -- waiting for EOS after Error
Waiting for EOS...
Got EOS from element "pipeline0".
EOS received - stopping pipeline...
Execution ended after 0:54:54.577079994
Setting pipeline to NULL ...
GST_ARGUS: Cleaning up
Freeing pipeline ...

However, sometimes the gstreamer preview windows just disappear without any error after hours of test.
If I use v4l2-ctl to test instead of gstreamer with nvargus, there are no errors. So I doubt libnvargus or nvarguscamerasrc may have some problems.

please see-also Topic 269543.
we’re now repo this locally to dig into such failure.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.