One frame latency/delay in TX1 V4L stack

We have noticed some problematically large latency in the TX1 V4L stack, and am wondering if you might be able to shed some light on where the issues may be coming from.

We are seeing ~66ms of latency in the camera stream, and believe that it is related to something in the TX1 driver stack because we have tested 3 different configurations with the same cameras:

  1. MIPI data directly into the TX1 results ~66ms of latency
  2. MIPI data into the TX1 via USB (using a MIPI to USB converter) results in ~66ms of latency
  3. MIPI data into Intel CPU via USB (using a MIPI to USB converter) results in ~8ms of latency

The hardware is the same for all of them, except for the MIPI to USB converter which is the same for 2 and 3. We used the same user space driver code for all 3 experiments.

The images are streaming in at 15FPS, so the frame period is suspiciously close to the latency number.

Is there some buffering happening that would make it so that images are not released to user space until the next image arrives?

Hello eba, have you checked this thread regarding a similar inquiry: https://devtalk.nvidia.com/default/topic/932687/jetson-tx1/jetson-tx1-gstreamer-latency-and-frame-rate/?

I was already running with max-perf

eba
"The images are streaming in at 15FPS
=> this means your frame is coming in to the system every 66ms but, for the CPU case, you have ~8ms latency. How did you measure it? From user space - as frame comes in at 66ms interval?

Both (1) and (2) should go through v4l2 kernel driver. Checking it too …

The soc_camera driver uses 2 kernel threads, one wait for frame start and the other wait for image data on the memory done. This is to handle higher frame rate such as 30fps. For your lower 15fps, you could modify the driver to use only one thread and only wait for memory done. This will reduce latency. We are working on improving on this at our later version of media controller driver.

Hello, eba:
Would you please provide the detailed steps to reproduce the latency you mentioned?

1) MIPI data directly into the TX1 results ~66ms of latency

We can check it internally.

br
ChenJian

Hi there,
Thanks for the input, can you provide a pointer to where in the v4l driver code (file, line) those threads and callbacks are happening?

To measure the latency, we move the camera, and compare the visible motion with an IMU. The images are timestamped by the struct v4l2_buffer.timestamp field. The imu data is timestamped by when the user space UART driver receives the IMU data. The latency is measured as the difference in phase between these signals.

Hello, eba:
Can you refer to the “Video for Linux User Guide” chapter in the release document, which can be downloaded @ http://developer.nvidia.com/embedded/dlc/l4t-documentation-23-2

Let me know if there’s any further problem.

br
Chenjian

Chenjian,
Those docs only provide a high level overview of the driver structure. Would be really helpful if you could provide a more direct pointer to where the threads that you mentioned are created/executed as the codebase is pretty big.

Also, in those docs, it says: “Software releases after R23: soc_camera driver is deprecated and replaced with the media-controller driver.”

Does this apply to R23.2? Can you share any info on when R24 will be released?
Thanks!

5/20 is the target date to release JetPack 2.2 which includes r24.1 L4T BSP. The media controller driver source code for OV5693 is included as a reference camera. soc_camera driver is still included in r24.1 but eventually will be deprecated as stated.

r24.1 BSP can be downloaded from,

I noticed a missing sample rootfs. While downloading docs it turned out that the “Tegra Software License Agreement” is not really the license…this is the sample rootfs, so sample rootfs is mislabeled and license is missing.

The mismatched rootfs sample was the case for the L4T 24.1 Beta, but that issue is fixed on the BSP release page: https://developer.nvidia.com/embedded/linux-tegra

hi eba,

in theoretically, the camera frame latency is >= 1/fps second.

according to comment #7,
a. please lower the camera fps to verify both (2) and (3) follow this rule.
b. may i have more details about how you measure frame latency? did you put device steady for a while, moving the device. and compare imu peak timestamp with first scene change timestamp?

eba,

Based on R24.2, the buffer release mechanism has been updated to avoid multiple race conditions we had with two thread design. the frames are released based on frame start event. for example, Nth Frame is released based on N+2th frame start event.
You can take two threads to track them independently. first thread tracking frame start, second thread tracking memory write events to improve the latency. thanks

Hi JerryChang,

We have experienced the same problem, our test result are coincides with your description. The v4l2_buffer timestamp is two frames earlier than the time we get the buffer on the userspace, is about 66ms. That is to say, the transfer time of image from v4l2-buffer to userspace is about 66 ms, but in our test the time from sensor to v4l2-buffer is more than 66ms.

As you say, we can take two threads to track them independently to improve the latency, could you speak a little more detailed about it?

the bsp version is R24.2, our fps is 30.

thanks

hello cloundliu,

share the latency improvement patch for your reference.
this patch may have stability issue for your use-case, please have verification.
thanks
0001-drivers-media-camera-Improve-VI-driver-latency.7z (4.99 KB)

Hi JerryChang,

Thanks for your reply.

I have generally read the code about the two thread design you mentioned. The first is for VI channel and the second is for MC channel, is it right? The data flow as follows:

sensor—>CSI—>VI(by pass)—>ISP(by pass)—>MC—>DDR

There is two frames delay on MC, when we apply the patch you give, it decrease to one frame delay.
Is there any delay on VI process? Of course, could you give me more information about the driver architecture?

When I run a capture process, I can’t find the running thread about VI using ps command.

hello cloundliu,

the one frame latency is expected.
since there’s sensor limitation that a complete frame buffer should waiting for the capture request from user-space till sensor’s end-of-frame.

there’s another issue you should look into too, please analysis the fps of your video files to see if there’s frame drop issue.
thanks

Hi JerryChang,

I know the one frame latency is expected, but I found the total delay is 100ms, including three parts

  1. sensor—>VI
  2. VI—>MC, just the one frame delay you said
  3. MC—>display

I have test the delay from sensor directly to MC, is about 90ms,what caused it? I also analysed the fps of my video files, it is correct and no frame drop issue. my sensor driver have refer to the ov5693 driver located int the driver/media/i2c/.

thanks