Framerate and latency issues/questions on TX2

Hi,

I’d like to share some work I was doing with gstreamer on top of TX2 and hopefully get some valuable input.
The system consists of a server and client.
On the server side I have a camera that is connected via USB3 to TX2 board, and following process is being applied:

  1. Frames are being grabbed in a rate of 30 fps. The frames are bayered with size of 4096x3008 (Mono8).
  2. Each frame goes through a de-bayer kernel which outputs an RGBA frame (takes ~5ms)
  3. Frames are being HW encoded (H264) and transmitted on top of RTSP

Note: The memory buffers are allocated in the managed memory so no CPU<->GPU copies are being done.
Server is connected directly to client using Ethernet cable.

Following is the pipeline I’m using on the server side:

appsrc name=videosrc is-live=true do-timestamp=true ! video/x-raw, format=\(fourcc\)YUY2 ! \
queue max-size-buffers=1 ! nvvidconv ! video/x-raw(memory:NVMM), format=(string)I420 ! \
omxh264enc preset-level=0 bitrate=10000000 control-rate=constant ! \
video/x-h264, stream-format=(string)byte-stream ! queue max-size-buffers=1 ! \
rtph264pay name=pay0 pt=96

On the client side I used two different configurations:

  1. A strong PC (SW decoder)
  2. TX1 boared (HW decoder)

following is the pipeline I’m using on the TX1 client:

gst-launch-1.0 rtspsrc location="rtsp://192.168.1.2:8554/video" latency=0 ! rtpjitterbuffer ! \
rtph264depay ! h264parse ! omxh263dec ! nvvidconv ! \
'video/x-raw, width=1024, height=752 format=(string)YUY2' ! xvimagesink sync=false

When I tested this system I noticed a big drop in frame rate. I observed ~15 fps which is half of what I capture. Based on the experience of other forum members, I tried multiple pipeline configurations, both on server and client, but didn’t see any significant change.
I measured latency (the simple way of filming a stopwatch and capturing it with my phones camera), and I got around 220ms for a 4096x3008 RBGA frame.
I also changed my camera’s configuration to capture other resolutions down to full HD, and saw that when I decrease frame size I get a better latency and higher frame rate; however frame rate was still half of what I captured.
Eventually, I modified my kernel to output YUY2 frames instead of RGBA. Effectively, 16 bits per pixel, rather than 32. This modification has released the bottleneck I had with frame rate and I started seeing the expected 30 fps being reported on the client side.

Sounds like a good ending but it wasn’t :).

After a short while of streaming frames, I got an error (seems per frame) on the server side, and as soon as it happened I saw a corruption on the video output on client side. I’m not sure what the source of this error is. I attached an image as reference and following is the error I get:

NVMAP_IOC_WRITE failed: Interrupted system call

The funny thing is that when I filtered out the frames on the server side (transmitted every second frame) and effectively cut fps in half this error didn’t reproduce. It seems like some element in my pipeline doesn’t really meet the fps.

Also, my requirement is to achieve a 150ms latency for a 4096x3008 video stream. I’ve been working on that for a while an I don’t see how can I reach this target. I’m not even sure this requirement is at all feasible. Any thoughts on that? did someone achieve such latency for this video size?

Honestly, I can’t say I have a good explanation to what I see and I’d appreciate if someone could shade some light.

Thanks.

Screenshot.png

hello eliyvzy3,

I measured latency (the simple way of filming a stopwatch and capturing it with my phones camera), and I got around 220ms for a 4096x3008 RBGA frame.
this is capture to display latency, is your target to reduce this to 150ms?

i would suggest you to break down the latency values into stages to help us narrow down the issue.
for example, sensor driver’s capture latency, ethernet transfer latency, display latency.
thanks

Hi Jerry,

Indeed, my requirement is to achieve 150ms latency from capture to display. However, please note that It should be achieved with the proper frame rate, which currently I have issues with.

I previously used GST_DEBUG=“GST_TRACER:7” & GST_TRACERS=“latency” flags to capture latency, however it is an end to end latency rather then per element latency which you require. I’m not familiar with a tool that gives the breakdown you require so I’d appreciate your guidance here.

Thanks.

hello eliyvzy3,

let’s exclude the transfer latency for now.
what’s your capture-display latency from server side?

Hi,

I’ve measured latency on the server side, however it is with a lower resolution (1280 x 1024). The latency I got is ~85ms, and it excludes the encoding part.
Initially, I used openCV to display the frames as they arrive but the performance was awful, so I’ve built a small gstreamer pipeline (appsrc -> videoscale -> xvimagesink). Using this pipeline I noticed that when I capture the full resolution it just terminates with no display. I attached the debug log for your reference.
I scaled resolution down so I’d be able to view the output and as I mentioned for the lower resolution it it 85ms.
I tried to find a solution to include the encoding part. I wanted to use V4l2sink element, as it seems to accept h264 stream, however I wasn’t able to integrate it successfully yet.

Thanks.
out.log (5.54 MB)

hello eliyvzy3,

thanks for the breakdown result, your 85ms result from the server side is expected.

however, there’s 135ms (220-85) gaps.
please continue breakdown the evaluation results,
thanks