Minimizing video latency between camera and program memory

(This post is linked to this one: Minimizing video latency on Jetson Xavier NX, but the camera, the Jetson models and JetPack/L4T versions are different)


We are currently using a Jetson TX2 with JetPack SDK 4.6.1 and the onboard camera included in the official devkit, in a project that is extremely sensitive to latency. We would like to have the smallest possible latency between the moment the photons hit the camera and the moment OpenCV can start to work on the video frame in our application. Ideally, that latency would be the rolling shutter time plus a tiny overhead.

We’ve so far had our best latency using GStreamer with libargus, and setting the sensor to 1280x720 @120fps, but we are fine with switching to other means of fetching video frames.

Measuring the time between the photons hitting the camera and the availability of some data in a software is hard to do with milliseconds accuracy. We therefore use the glass-to-glass latency, in a similar fashion as what is described in NVIDIA Jetson TX1 TX2 Glass to Glass latency | Jetson TX1 TX2 Capture | RidgeRun - RidgeRun Developer Connection

Here is how we take our measurements: We record a miliseconds-accurate clock display with the TX2 camera. Then, we record with a 120FPS phone camera at the same time the ground truth clock and the TX2 camera video feed. By comparing the frame’s difference, we obtain the latency estimate. Unfortunately, this also measures the time it needs to display the video frame on a monitor, while we’re just interested in the time it needs to get the frame in memory. (We do not know how long it realistically takes once the frame is available in the userland software to display it on our 60Hz monitor.). We measure the latency with the following command:

gst-launch-1.0 nvarguscamerasrc ! ‘video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=(fraction)120/1’ ! nvvidconv ! xvimagesink

We measure on average 133ms (16 frames on the 120 FPS camera phone). Since the TX2 sensor is set to 120 FPS, the rolling shutter time should only be 8.3ms. This leaves 125ms for the frame to transfer to userland memory, and then to transfer to the monitor. Even with a GPU+monitor latency of 20ms, it still leaves 105ms for the transfer between the camera and the software memory. This is the time we would like to shorten as much as possible.

We have read a couple of older threads that have been dealing with similar issues, but these threads are old and do not reflect the latest developments of L4T:

  • csi-latency-is-over-80-milliseconds
  • one-frame-latency-delay-in-tx1-v4l-stack

We have not applied any modified library as suggested in the post above, since we are using r32.7.1, and we are pessimistic about ABI compatibility of these modifications.

Still, with all the above, we feel like we’re at the bottom of the rabbit hole, yet the latency is way too high for our application. Any help would be really appreciated.

For low latency we would suggest use jetson_multimedia_api. The low-level APIs does not go through gstreamer frameworks and this eliminates certain latency. You may try the samples:


And please enable VIC engine(hardware converter) at maximum clock:
Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL


Thank you very much for your quick answer.
The latency has been reduced from 133ms to 50ms by using the sample 09.

How difficult would it be to reduce the latency even further?

This should be the minimum latency since there are frame buffering in Argus stack.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.