Gstreamer lag increases with frame rate recording from Raspberry Pi camera (nvarguscamerasrc and nvivafilter)

jetsonnvidia · December 8, 2022, 7:22pm

I have noticed a massive lag when processing Rapsberry Pi camera frames with CUDA. This makes it almost impossible to control my drone.

This is the gstreamer pipeline I use in my program:

nvarguscamerasrc exposurecompensation=1 gainrange='8 16' ! video/x-raw(memory:NVMM), width=(int)1
280, height=(int)720, format=(string)NV12, framerate=(fraction)30/1 ! nvivafilter cuda-process=true customer-lib-name=libnvsample_cudaprocess.so ! v
ideo/x-raw(memory:NVMM), format=(string)NV12 ! omxh264enc ! qtmux ! filesink location=video.mov

Basically it is a linear pipeline that passes the video through nvivafilter so that I can analyse it in CUDA, and finally it gets saved to disk as H264 video.

I have a test that flashes a light and writes some simple graphics to the current video frame. I am then able to look at the saved video and measure the frame delay between the graphics appearing and the flash.

This is the delay I see at different frame rates:

120ms @ 73fps (target frame rate was 120fps)
50ms @ 60fps
33ms @ 30fps

There is clearly some buffering happening somewhere. It seems perverse that I am having to run my algorithm at the lowest frame rate to get the lowest latency. :-S

Please could someone tell me how to reduce the latency at the highest frame rate?

Thanks!

Honey_Patouceul · December 8, 2022, 11:49pm

Is it the same with the following ? :

nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1280, height=720, format=NV12, framerate=30/1 ! nvivafilter cuda-process=true pre-process=false post-process=false customer-lib-name=libnvsample_cudaprocess.so ! video/x-raw(memory:NVMM), format=NV12 ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=video.mov

jetsonnvidia · December 18, 2022, 11:55pm

Hi, thanks for the reply and apologies for my delay in replying.

I see that your pipeline changes 2 things: it uses nvv4l2h264enc and it disables pre/post processing.

I tested these separately and together and measured the lag across 5 separate runs then averaged the result.

Thus, the results for the delay in milliseconds between reality and the frame becoming available in the nvivafilter CUDA code are in this table:

It seems that the encoder type did not matter and neither did setting pre/post process or not. I’d like to add that when I did the runs, there was some variation in the delays; I would say about 25% variation around the mean.

The table makes it very clear that increasing FPS increases the lag.

I welcome any other suggestions to remedy this lag. nvivafilter does not seem to have any “buffered-frames”-type property to set.

Thanks.

jetsonnvidia · December 19, 2022, 10:52am

I noticed that my CPU core utilisation is as follows:

16% 100% 14% 13%

What if the nvarguscamerasrc element takes more CPU the higher the frame rate? Could this increase the latency?

Honey_Patouceul · December 19, 2022, 11:26pm

It is common AFAIK that Argus takes some CPU time (@120 fps, may take one core 100%; maybe more on Nano).

Did you boost clocks with jetson_clocks script ?

Also try boosting clocks for VI/VIC/NVENC such as here for VIC:

ShaneCCC · December 20, 2022, 8:01am

File write could be the bottleneck to hurt performance.
Suggest check the performance by log would be great.

mhd0425 · December 20, 2022, 1:07pm

qtmux uses a lot of memory building an index table by default. I would try with matroskamux and see if there is any difference.

Honey_Patouceul · December 20, 2022, 10:40pm

You may rule out container and filesystem with:

gst-launch-1.0 -v nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1280, height=720, format=NV12, framerate=30/1 ! nvivafilter cuda-process=true pre-process=false post-process=false customer-lib-name=libnvsample_cudaprocess.so ! video/x-raw(memory:NVMM), format=NV12 ! nvv4l2h264enc ! h264parse ! fpsdisplaysink video-sink=fakesink text-overlay=0

For measurements, I’d suggest discarding the first 20 frames and then average next 100 frames.
You may also try adding a queue before filesink and see if it helps.

jetsonnvidia · December 22, 2022, 6:03pm

If I run jetson_clocks first, the improvement in latency and FPS is significant. Thank you so much for this. Here is a new table averaged over 5 runs when I have run jetson_clocks (no pre/post-process is being specified):

I am now always getting at least 104 FPS, usually >113 FPS when I specify “120/1” in the pipeline (it used to max out at 79 FPS). The test is done as before by flashing an LED using GPIO and measuring the time it takes for the flash to appear in the CUDA process. I process 100 frames before flashing the LED.

Some comments on the latencies when encoding/saving H264 vs not encoding/saving H264:

At 120 FPS and encoding/saving H264 the latency varies much more than when not encoding/saving H264. e.g. one run can show 27ms latency and the next shows 95ms.
This is concerning but maybe I have to put up with this?
Compare this to 60 FPS when the latency is always 22ms and does not vary at all when saving/encoding H264.

As you can see in the table, I did a test with nfs unmounted because usually I have a share on the Jetson mounted remotely but this did not improve the FPS.

N.B. In the table, encoding H264 implicitly means to also save the video to disk. When I am not encoding, my pipeline terminates in the “fakesink” element.

Why does this script make such an amazing difference?

BTW, I would like to be clear that my main problem has always been latency. I am happy that the FPS has been boosted to almost 120 FPS however the latency is much more important to me in this application.

jetsonnvidia · December 22, 2022, 6:08pm

I don’t think that boosting the clocks for VI/VIC/NVENC applies to Jetson Nano because I do not have the relevant sysfs nodes:

$ sudo find /sys/kernel/debug/ -name "*vic*"                 
/sys/kernel/debug/ieee80211/phy0/netdev:p2p-dev-wlan0/iwlmvm/os_device_timediff
/sys/kernel/debug/ieee80211/phy0/netdev:wlan0/iwlmvm/os_device_timediff
/sys/kernel/debug/pg_domains/vic
/sys/kernel/debug/clk/vic03
/sys/kernel/debug/clk/vic03.cbus
/sys/kernel/debug/clk/vic.floor.cbus
/sys/kernel/debug/pcie/list_devices
/sys/kernel/debug/vic
/sys/kernel/debug/tracing/events/cfg80211/rdev_start_p2p_device
/sys/kernel/debug/tracing/events/cfg80211/rdev_stop_p2p_device
/sys/kernel/debug/tracing/events/iommu/add_device_to_group
/sys/kernel/debug/tracing/events/iommu/remove_device_from_group
/sys/kernel/debug/tracing/events/iommu/attach_device_to_domain
/sys/kernel/debug/tracing/events/iommu/detach_device_from_domain
/sys/kernel/debug/tracing/events/random/add_device_randomness
/sys/kernel/debug/tracing/events/nvhost/nvhost_vm_init_device
/sys/kernel/debug/tracing/events/ext4/ext4_evict_inode
/sys/kernel/debug/tracing/events/power/device_pm_callback_start
/sys/kernel/debug/tracing/events/power/device_pm_callback_end
/sys/kernel/debug/usb/devices
/sys/kernel/debug/70019000.iommu/as000/54340000.vic
/sys/kernel/debug/70019000.iommu/masters/54340000.vic
/sys/kernel/debug/pinctrl/pinctrl-devices

DaneLLL · December 28, 2022, 6:27am

Hi,
Please execute the steps to run system at maximum performance mode. And check if there is improvement:

Run $ sudo nvpmodel -m 0 and $ sudo jetson_clocks
Set the property to hardware encoder:

  maxperf-enable      : Enable or Disable Max Performance mode
                        flags: readable, writable, changeable only in NULL or READY state
                        Boolean. Default: false

Enable VIC engine at maximum clock:
Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL

You can check CPU/GPU/NVENC status by executing sudo tegrastats

jetsonnvidia · December 29, 2022, 2:27pm

I already have nvpmodel set to 0. jetson_clocks has been run and given a fairly big improvement.
I see that there is no such property for omxh264enc so I assume you meant nvv4l2h264enc.
I followed the link and as the commands don’t translate exactly to the Jetson Nano, I used the below but they don’t work either:

$ cat /sys/devices/50000000.host1x/54340000.vic/power/control
auto
$ sudo echo on > /sys/devices/50000000.host1x/54340000.vic/power/control
-bash: /sys/devices/50000000.host1x/54340000.vic/power/control: Permission denied

$ cat /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/governor
wmark_active
$ sudo echo userspace > /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/governor
-bash: /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/governor: Permission denied

$ cat /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/available_frequencies
192000000 307200000 345600000 409600000 486400000 524800000 550400000 576000000 588800000 614400000 614400000 627200000
$ sudo echo 627200000 > /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/max_freq
-bash: /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/max_freq: Permission denied

$ ls /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/
available_frequencies  available_governors  cur_freq  device  governor  max_freq  min_freq  polling_interval  power  subsystem  target_freq  trans_stat  uevent
$ sudo echo 627200000 > /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/target_freq
-bash: /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/target_freq: Permission denied

Thus, here are the results so far (first 2 columns are just for control):

DaneLLL · December 30, 2022, 2:43am

Hi,
Looks like yo fail to set VIC to maximum clock. Please run $ sudo su to enter supper user and try the commands. See if this works.

And please boost the clock of NVCSI and ISP engines:
Jetson/l4t/Camera BringUp - eLinux.org

jetsonnvidia · December 30, 2022, 1:19pm

Firstly, I’d like to clarify that at 60fps, the latency is always 1 frame or less so all tests from now on relate to 120fps where the latency is rarely 1 frame and always 1-10 frames (it varies between separate runs but I do not present that information here).

Running as su (not sudo) worked mostly, however this failed:

$ echo 627200000 > /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/target_freq 
bash: /sys/devices/50000000.host1x/54340000.vic/devfreq/54340000.vic/target_freq: Permission denied

On Jetson Nano it seems that NVCSI/ISP engines cannot be boosted as this directory does not exist: “/sys/kernel/debug/bpmp/” and there are very few sysfs filenames containing “bpmp” and none that are in debugfs. All I found were these:

$ ls /sys/kernel/debug/clk/isp/
clk_accuracy  clk_enable_count  clk_flags  clk_notifier_count  clk_parent  clk_phase  clk_possible_parents  clk_prepare_count  clk_rate  clk_state  clk_update_rate  dvfs_freq_offs  frequency_stats_table
$ ls /sys/kernel/debug/clk/csi
clk_accuracy  clk_enable_count  clk_flags  clk_notifier_count  clk_parent  clk_phase  clk_prepare_count  clk_rate  clk_state  clk_update_rate  dvfs_freq_offs  frequency_stats_table

But I don’t see how these would relate to the link you gave.

Given this, these are the tests over averages of 10 runs:

Ultimately, having a variable frame latency at 120fps from reality to the CUDA algorithm of 1-10 frames makes control difficult. At the moment I have 2 choices:

Run at 60fps when the latency is always 22ms and I can save h264 for the debugging purposes I need it for,
Run at 120fps without saving h264 and accept a variation in latency of 20-40ms. (If saving h264, the variation in latency is much greater and the average latency is about 50% higher)

To be clear, I should now be okay with my algorithm but any further reductions of latency (and reductions in variation of latency) are welcome. :-)

jetsonnvidia · January 4, 2023, 4:19pm

I realise I never replied to this.

The answer is, with H264 encoding and fakesink there is no improvement in FPS or latency. I also tried writing the encoded file to a ramdisk but no improvement there either.

DaneLLL · January 5, 2023, 2:11am

Hi,
Do you use Jetpack 4.6.2 or 4.6.3? Not sure if you use latest release.

jetsonnvidia · January 5, 2023, 10:05am

I think I am using 4.6.3 because I have L4T version 32.6.1.

DaneLLL · January 6, 2023, 4:02am

Hi,
r32.6.1 is Jetpack 4.6. Is it possible to upgrade to later release and try?

jetsonnvidia · January 6, 2023, 10:57am

Ah, I beg your pardon.

The latest Jetpack release is 4.6.3/L4T 32.7.3 but the only changes there are security fixes which would not improve performance.

Let me ask you and Nvidia a question:

Please could nvidia provide nvargussrc with a reduced or customisable buffer queue size?

According to this post Nvidia has done that in the past with a similar plugin:

It would greatly improve my product if you could do this because at present the nvargussrc gstreamer plugin is poorly optimised.

DaneLLL · January 6, 2023, 11:15am

Hi,
There are buffers in Argus stack for capturing Bayer frames and then queue in ISP engine to output YUV frames. It is minimum buffer number in current implementation which is tested and verified in SQA tests. Reducing the number may impact system stability. It is fixed value and not able to be customized.

Please share the gstreamer commands and the steps for checking latency. So that we can set up and try to replicate the issue on Jetson Nano+Raspberry Pi camera V2. And then check with our teams.

Topic		Replies	Views
Video Broadcasting - Latency/Lag reduction Jetson Nano gstreamer	16	2216	October 18, 2021
Jetson Nano FullHD Camera 60 fps more latency than 30 fps Jetson Nano camera	12	2683	October 15, 2021
Gstreamer CUDA Implementation Low FPS, cudaDeviceSynchronize Load Jetson Nano gstreamer	2	1090	October 15, 2021
Low-Latency CSI Camera Stream Jetson Nano camera	22	5542	October 15, 2021
CSI latency is over 80 milliseconds...? Jetson TX2	27	12264	November 19, 2019
Inconsitent timestamps and framerate lower than requested in nvarguscamerasrc Jetson Nano camera	20	1793	October 15, 2021
What is limiting the camera framerate (gstreamer) Jetson Nano camera , gstreamer	11	3283	December 22, 2021
OpenCV camera lag Jetson Nano opencv , gstreamer , nano2gb	11	3660	December 16, 2020
This shouldn't be this hard, recording from 2 cameras Jetson Nano	18	3533	October 15, 2021
Stream 4k webcam - GStreamer Jetson Nano	26	9905	October 15, 2021

Gstreamer lag increases with frame rate recording from Raspberry Pi camera (nvarguscamerasrc and nvivafilter)

Related topics