nvcamerasrc queue-size property limitation

Hi!

I am working with nvcamerasrc element and figure out that queue-size property has a minimum value of 10 buffers.

queue-size          : Number of buffers for the driver to enqueue
                        flags: readable, writable
                        Integer. Range: 10 - 100 Default: 10

Is there any way to modify the nvcamerasrc element to remove this queue-size limitation to at least a minimum value of 2 buffers?

I am working on an application that requires a very low latency, so this 10 buffers queue of the nvcamerasrc is affecting the latency performance of my application.

I am working with IMX274 camera at 1080p 60fps. I measured the glass to glass latency of a simple capture and display pipeline and got 86 ms of latency. Below you will find the pipeline used in the test.

gst-launch-1.0 -v nvcamerasrc sensor-id=1 fpsRange="60 60" ! 'video/x-raw(memory:NVMM), \
width=(int)1920, height=(int)1080, format=(string)I420, framerate=(fraction)60/1' ! \
perf print-arm-load=true ! nvoverlaysink sync=false

I am using a Tegra X1 with Jetpack 3.0.

Could you provide me the modified nvcamerasrc binary or instructions on how to modify it?

Thanks in advance for any help!

Best regards,
-Daniel

Hi dgarba, it needs to rebuild libgstnvcamera.so. Looks like you are on r24.2.1. Don’t you have concern of dropping frame with queue-size=2?

Hi DaneLLL

Thanks for your quick response.

Yes, I am on r24.2.1 for TX1.

At this point my priority is to reduce the latency as much as possible. I have to analyze the behavior of my application under dropping frame to determine if it affects.

How can I rebuild libgstnvcamera.so ? Where can I get access to the libgstnvcamera.so source code?

What any other suggestion do you have to reduce the latency on the capture ?

Thanks in advance for any help!

Best regards,
-Daniel

hello dgarba

for R24.2.1, please notice that we had a 2-frames delay before sending the buffer to user-space in v4l2 kernel driver side
sharing code snippet as below for your reference.
you could decrease QUEUED_BUFFERS for testing,
thanks

$TOP/kernel/drivers/media/platform/tegra/camera/mc_common.h
#define   QUEUED_BUFFERS  4
 
$TOP/kernel/drivers/media/platform/tegra/camera/channel.c
static void tegra_channel_ring_buffer(){
...
    /* release buffer N at N+2 frame start event */
    if (chan->num_buffers >= (QUEUED_BUFFERS - 1))
    free_ring_buffers(chan, 1);

Hi Jerry.

Thanks for your help, I will give a try on it. But, from the nvcamerasrc side I also needs something to decrease the latency. Could you help me by generating a modified binary version of libgstnvcamera.so without the queue-size property 10 buffers minimum limitation. It could be okay if the minimum value is reduced to 2,3 or 4 buffers at least. My application doesn’t cares about buffers dropping.

Thanks in advance for any help.

Best regards,
-Daniel

Hi dgarba, attach the prebuilt lib for your reference. Because queue-size=2,3,4 is not verified, it is for test only.
libgstnvcamera.so.txt (109 KB)

Hi DaneLLL.

Thanks a lot for your help.

I tested the binary on a TX1 L4T version 24.2.1, and everything works fine with the binary. I was able to reduce the glass to glass latency from 172 ms to 86ms and maintain a stable framerate of 60fps when set the queue-size property of nvcamerasrc to 4 (default value). If I used a smaller number (3 and 2) in the queue-size property, the framerate drops to 45fps and 30fps respectively.

I will continue doing more tests and follows the above suggestions to reduce the glass to glass latency considering the trade-off with the frame drops.

Best regards,
-Daniel

Hi DaneLLL.

I tested the binary provided and everything works fine.

By setting the queue-size property of nvcamerasrc to 6 I was able to reduce the glass to glass latency to 64 ms in average (the measured data varies between 43 ms and 86 ms randomly).

Below you will find the pipeline used for testing:

gst-launch-1.0 -v nvcamerasrc queue-size=6 sensor-id=1 fpsRange="60 60" ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)I420, framerate=(fraction)60/1' ! perf print-arm-load=true ! nvoverlaysink sync=true enable-last-sample=false

If I set the queue-size property to 10, I got a stable 86 ms glass to glass latency. If I reduce the queue-size property below 5, the framerate starts to drop as more as I reduce the queue-size property.

I also, modified the QUEUED_BUFFERS macro of the following file: $TOP/kernel/drivers/media/platform/tegra/camera/mc_common.h, as you suggested in post #4 of this thread. I reduced its value to the half, and then recompile the kernel, but didn’t get any visible change in latency reduction or framerate performance.

Below you will find a pair of questions. I will thank you a lot if you could help me with this.

1) Please, could you give me any other hint to reduce even more the latency. I need to be around the 48 ms of glass to glass latency. The Tegra X1 is a very powerful board, so it is strange for me that has a high latency in the capture to display path. The 86 ms should be reduced even more in comparison with other less powerful boards I have worked with it.

2) Could you explain me the relation/interaction between nvcamerasrc and the v4l2 kernel driver. I know that the nvcamerasrc uses the camera_daemon under the hood, but I am not able to figure out where does the v4l2 driver interact with it, because there is very few documentation about camera_daemon. I understand that if I was using v4l2src for the capture, the v4l2 driver will be called and used, but in this case I am using nvcamerasrc, because I need to pass through the ISP to do the de-bayering process. Is ask this because I don’t understand why the QUEUE_BUFFERS reduction suggestion could help me reducing the latency, because this is part of the v4l2 driver. Also, the tests result give me no latency reductions.

Best regards,
-Daniel

Hi dgarba,
We have some patches for sensors not going through NVIDIA ISP engine:
https://devtalk.nvidia.com/default/topic/934387/jetson-tx1/one-frame-latency-delay-in-tx1-v4l-stack/post/5059666/#5059666
#4 https://devtalk.nvidia.com/default/topic/1023668/jetson-tx1/nvcamerasrc-queue-size-property-limitation/post/5208590/#5208590

But your case goes through NVIDIA ISP engine, so the SW stacks are all in NVIDIA prebuilt libs and nothing you can do to control/improve the latency.

Hi @DaneLLL

Could you please be so kind and give me the same modified version of nvcamerasrc prebuilt library of the above comment, but for TX2 L4T-28.2 JetPack-3.2 ?

Thanks in advance for your help!

Best regards,
-Daniel

Hi DaneLLL,

Could you help us with the binary?

Thanks
-David

Sorry I missed this post. I did post it @ https://devtalk.nvidia.com/default/topic/995036/jetson-tx1/how-to-shorten-the-latency-in-streaming-media-on-tegra-tx1/post/5247433/#5247433

But I think minimum queue-size=4 is more appropriate.

Hi DaneLLL,

No problem, thanks for your answer. Yes, we have a previous version for TX1 but we need a new version for Jetpack 3.2 and TX2, even better if you can provide it for both TX1 and TX2.

-David

Hi David,
The link in comment #12 contains the prebuilt lib for r28.2. It can be applied on both TX1 and TX2.

Hi DaneLLL,

Sorry, I didn’t notice that one was for TX2. I will give it a try. Thank you!

-David

Hi DaneLLL,

Could you please advise how to modify the driver as in #6 for Jet Pack 31.1? I bought Jetson Xavier and borrowed OV5693 camera from my friend to test the latency. The hardware setup is like https://elinux.org/images/thumb/3/30/Xavier_CSI_Camera_Module.jpg/450px-Xavier_CSI_Camera_Module.jpg. The best result I got is about 150msec.

Hi,
nvcamerasrc is deprecated on Xavier release. You may use nvarguscamerasrc.

You may also try tegra_multimedia_api sample:
https://devtalk.nvidia.com/default/topic/1044104/jetson-agx-xavier/jetson-xavier-agx-glass-to-glass-latency/post/5297938/#5297938

Hi DaneLLL,

When I tried to compile the sample, it has the following error:

/usr/bin/ld: cannot find -largus

What did I miss?

Hi,
Please run Jetpack to install the samples.

Hi DaneLLL,

Thanks for your advice. I finally got the latency of about 80msec. The following summarizes what I have done:

  1. Download latest Tegra_Multimedia_API - https://developer.nvidia.com/embedded/dlc/l4t-multimedia-api-32-1-JAX-TX2
  2. Use Jetpack to install latest NVIDIA CUDA, OpenCV4tegra, cuDNN, NVIDA TensorRT according to https://docs.nvidia.com/jetson/l4t-multimedia/mmapi_build.html
  3. Compile libargus according to https://developer.ridgerun.com/wiki/index.php?title=Xavier/Video_Capture_and_Display/Software_Support/Libargus
  4. Build program in folder of tegra_multimedia_api/samples/09_camera_jpeg_capture
~/Documents/tegra_multimedia_api/samples/09_camera_jpeg_capture$ ./camera_jpeg_capture -s --cap-time 60 --fps 120 --disable-jpg --sensor-mode 2 --pre-res 1280x720
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1280 height 720
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
CONSUMER: Waiting until producer is connected...
PRODUCER: Available Sensor modes :
PRODUCER: [0] W=2592 H=1944
PRODUCER: [1] W=2592 H=1458
PRODUCER: [2] W=1280 H=720
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
(Argus) Error BadValue:  (propagating from src/eglstream/FrameConsumerImpl.cpp, function releaseFrame(), line 327)
CONSUMER: Done.
----------- Element = renderer0 -----------
Total Profiling time = 59.8957
Average FPS = 59.8206
Total units processed = 3584
Num. of late units = 3583
-------------------------------------

I noticed that :

  1. My monitor is also needed to set to 60Hz refresh rate by ``` xrandr --output HDMI-0 --mode 3840x2160 --rate 60 ```
    xrandr
    Screen 0: minimum 8 x 8, current 3840 x 2160, maximum 32767 x 32767
    HDMI-0 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 610mm x 350mm
       3840x2160     30.00 +  60.02    59.98*   50.01    29.97    25.00    24.00    23.98  
       2560x1440     59.96  
       2048x1280     59.99  
       2048x1080     24.00  
       1920x1080     60.00    59.95    50.00    30.00    29.97    25.00    24.00    23.98  
       1600x1200     60.01  
       1600x900      60.00  
       1280x1024     75.03    60.00  
       1280x720      60.00    59.94    50.00  
       1152x864      75.00  
       1024x768      75.03    60.01  
       800x600       75.00    60.32  
       720x576       50.00  
       720x480       59.94  
       720x400       70.04  
       640x480       75.00    59.94    59.94  
    DP-0 disconnected (normal left inverted right x axis y axis)
    DP-1 disconnected (normal left inverted right x axis y axis)
    
  2. Although I set the camera frame rate to 120fps, the "Total Profiling time" is still about 60.

I would be grateful if you can advise me how I can further reduce the latency. My target is below 50msec for 1080p like this - https://developer.ridgerun.com/wiki/index.php?title=Jetson_glass_to_glass_latency