Basically it is a linear pipeline that passes the video through nvivafilter so that I can analyse it in CUDA, and finally it gets saved to disk as H264 video.
I have a test that flashes a light and writes some simple graphics to the current video frame. I am then able to look at the saved video and measure the frame delay between the graphics appearing and the flash.
There is clearly some buffering happening somewhere. It seems perverse that I am having to run my algorithm at the lowest frame rate to get the lowest latency. :-S
Please could someone tell me how to reduce the latency at the highest frame rate?
It seems that the encoder type did not matter and neither did setting pre/post process or not. I’d like to add that when I did the runs, there was some variation in the delays; I would say about 25% variation around the mean.
The table makes it very clear that increasing FPS increases the lag.
I welcome any other suggestions to remedy this lag. nvivafilter does not seem to have any “buffered-frames”-type property to set.
For measurements, I’d suggest discarding the first 20 frames and then average next 100 frames.
You may also try adding a queue before filesink and see if it helps.
If I run jetson_clocks first, the improvement in latency and FPS is significant. Thank you so much for this. Here is a new table averaged over 5 runs when I have run jetson_clocks (no pre/post-process is being specified):
I am now always getting at least 104 FPS, usually >113 FPS when I specify “120/1” in the pipeline (it used to max out at 79 FPS). The test is done as before by flashing an LED using GPIO and measuring the time it takes for the flash to appear in the CUDA process. I process 100 frames before flashing the LED.
Some comments on the latencies when encoding/saving H264 vs not encoding/saving H264:
At 120 FPS and encoding/saving H264 the latency varies much more than when not encoding/saving H264. e.g. one run can show 27ms latency and the next shows 95ms.
This is concerning but maybe I have to put up with this?
Compare this to 60 FPS when the latency is always 22ms and does not vary at all when saving/encoding H264.
As you can see in the table, I did a test with nfs unmounted because usually I have a share on the Jetson mounted remotely but this did not improve the FPS.
N.B. In the table, encoding H264 implicitly means to also save the video to disk. When I am not encoding, my pipeline terminates in the “fakesink” element.
Why does this script make such an amazing difference?
BTW, I would like to be clear that my main problem has always been latency. I am happy that the FPS has been boosted to almost 120 FPS however the latency is much more important to me in this application.
Firstly, I’d like to clarify that at 60fps, the latency is always 1 frame or less so all tests from now on relate to 120fps where the latency is rarely 1 frame and always 1-10 frames (it varies between separate runs but I do not present that information here).
Running as su (not sudo) worked mostly, however this failed:
On Jetson Nano it seems that NVCSI/ISP engines cannot be boosted as this directory does not exist: “/sys/kernel/debug/bpmp/” and there are very few sysfs filenames containing “bpmp” and none that are in debugfs. All I found were these:
Ultimately, having a variable frame latency at 120fps from reality to the CUDA algorithm of 1-10 frames makes control difficult. At the moment I have 2 choices:
Run at 60fps when the latency is always 22ms and I can save h264 for the debugging purposes I need it for,
Run at 120fps without saving h264 and accept a variation in latency of 20-40ms. (If saving h264, the variation in latency is much greater and the average latency is about 50% higher)
To be clear, I should now be okay with my algorithm but any further reductions of latency (and reductions in variation of latency) are welcome. :-)
The answer is, with H264 encoding and fakesink there is no improvement in FPS or latency. I also tried writing the encoded file to a ramdisk but no improvement there either.
Hi,
There are buffers in Argus stack for capturing Bayer frames and then queue in ISP engine to output YUV frames. It is minimum buffer number in current implementation which is tested and verified in SQA tests. Reducing the number may impact system stability. It is fixed value and not able to be customized.
Please share the gstreamer commands and the steps for checking latency. So that we can set up and try to replicate the issue on Jetson Nano+Raspberry Pi camera V2. And then check with our teams.