How to shorten the latency in streaming media on tegra tx1

HooverLv · February 19, 2017, 4:17am

I am using OpenGL to stream PAL videos from 7 channels, then composing the video onto display. The latency is critical to our project. each time a whole YUV frame is captured, I would convert it into RGB, then send it to the GPU through a PBO buffer. Using gettimeofday, I can see this process takes less than 40 ms. Yet the measured glass-to-glass latency is 180 ms.

So, besides optimizing the frame process time, is there any other settings in the system that I could tune to reduce the display latency?

ShaneCCC · February 20, 2017, 5:41am

HooverLv
First you can try to boot the system performance to try. Jetson/Performance - eLinux.org
Second you need to break down your follow and check any possible memory copy that hurt the performance.

dgarba · March 15, 2018, 5:05pm

Hi @HooverLv

You may find interesting the following link:

This wiki page is intended to be used as a reference for the Tegra X1 (TX1) capture to display glass to glass latency using the simplest GStreamer pipeline. The tests were executed with the IMX274 camera sensor, for the the 1080p and 4K 60fps modes.The tests were done using a modified nvcamerasrc binary provided by Nvidia, that reduces the minimum allowed value of the queue-size property from 10 to 2 buffers. This binary was built for Jetpack 3.0 L4T 24.2.1. Similar test will be run on TX2.

Also, It expose some Glass to glass latency measurement reliable methods.

I hope this information helps you!

Best regards
-Daniel

sperok · March 16, 2018, 10:10pm

NVIDIA - Might the nvcamerasrc binaries with reduced queue size minimum described by dgarba be available for 28.2?

clutch12 · March 19, 2018, 11:17pm

Would this also be compatible with the TX2?

DaneLLL · March 26, 2018, 1:56am

Hi sperok/clutch12,
We don’t have minimum queue size reduced in r28.2. Do you see improvement in latency by configuring queue size=2?
r28.2-libgstnvcamera.so.txt (111 KB)

sperok · March 26, 2018, 2:43am

All of our TX-1 and TX-2 modules are currently on 28.1 and headed to 28.2. Back in the days when we were running 24.1 we did not have this code available to us so have never tested it. We would be happy to do so if a patch could be made available for 28.1 or 28.2.

clutch12 · April 10, 2018, 8:05pm

@DaneLLL yeah, I see no difference modifying the queue-size below 10. I tried a range of values.

DaneLLL · April 11, 2018, 5:21am

Please try the prebuilt lib in comment #6

cloundliu · May 31, 2018, 3:03am

Hi dgarba,

I have use the Glass to glass latency measurement method exposed by the following link:

https://developer.ridgerun.com/wiki/index.php?title=Jetson_glass_to_glass_latency

The result what I get is approximately 85ms for a 30fps camera(TX1), and I have reduce 2-frames delay to 1-frame for kernel driver.The gstreamer pipeline I used as follows:

gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=true ! \
'video/x-raw, width=2560, height=800,format=UYVY,framerate=30/1' ! nvvidconv ! \
'video/x-raw(memory:NVMM), format=I420, framerate=30/1' ! \
nvoverlaysink sync=true enable-last-sample=false

After running this command for a while，it report the following error, and video is not flowing.

Additional debug info:
gstbasesink.c(2854):gst_base_sink_is_too_late()
There may be a timestamping problem,or this computer is too slow.
WARNING:A lot of buffers ard being dropped

thanks.

dgarba · June 21, 2018, 4:15pm

Hi cloundliu,

Sorry for my late answer, the post update notification was on spam filter.

The problem you described seems to be that the GStreamer pipeline is dropping buffers. This may be caused because V4L2src does not have enough buffers on queue to handle the pipeline capture request and latency. I recommend you to go back to 2-frames delay for the kernel driver, because 1 frame seems to be not enough to maintain stream stability trough the pipeline. Test again after reverting that change only.

Also, you can try using another videosink that do not required NVMM type buffers, so the nvvidconv element could be removed from the pipeline and hence the whole latency could be reduced. I suggest for example something like this:

gst-launch-1.0 v4l2src device=/dev/video0 do-timestamp=true ! \
    'video/x-raw, width=2560, height=800,format=UYVY,framerate=30/1' ! \
    xvimagesink sync=true enable-last-sample=false

I hope that helps you!

Best regards,
-Daniel