TX2 h265 encoding performance with 6 720p@30fps cameras

Hi,

I’m trying to run several gstreamer encoding pipelines on a TX2 on a custom board with 6 GMSL cameras using nvv4l2h265enc using 1280x720@30fps (this is what the sensors are outputting, using UYVY). I’ve encountered some performance issues and I used Ridgerun’s excellent gst-shark tool for profiling.

The pipeline looks like this per camera / stream:
gst-launch-1.0 nvv4l2camerasrc device=/dev/videoX ! “video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)UYVY, framerate=(fraction)30/1” ! queue max-size-buffers=1 leaky=2 ! nvvidconv ! queue max-size-buffers=1 leaky=2 ! nvv4l2h265enc maxperf-enable=1 insert-sps-pps=1 bitrate=1000000 ! queue max-size-buffers=1 leaky=2 ! udpsink host=Y port=Z -e

All graphs are profiling a single pipeline from the multi-pipeline setting (1 gst-launch process per pipeline).

When using 4 cameras we get good performance and latency as shown in the attached images with 4cams in their name.


When using 6 cameras, the processing time and latency of the nvv4l2h265enc component is much higher and much less uniform and this is a problem for us as we’re trying to keep the encoding latency as low as possible.


I ran jetson_clocks.sh, set the power mode to 0 etc.

Is there anything I’m doing wrong here? Do you have any tips that might point me in the right direction?

Any help would be appreciated!!

Hi,
If your system is > JP4.4, you may try nvv4l2camrasrc to run command like:

gst-launch-1.0 nvv4l2camerasrc ! video/x-raw(memory:NVMM), format=UYVY ! nvvidconv ! nvv4l2h265enc ! udpsink

This eliminates the memory copy of using v4l2src plugin:

v4l2src ! video/x-raw, format=UYVY ! nvvidconv ! video/x-raw(memory:NVMM) ! ...

Should bring performance improvement. Please give it a try.

@DaneLLL thanks for your response!
I edited the post, I am using nvv4l2camerasrc as the source plugin. Also, I added the caps I use (as you can see from the graphs, there’s a capsfilter0 element). Any other pointers will be very helpful.

Hey @alexk2 , the encoder may be configured for low latency operation. If your application allows it, it may be worth the shot. NVIDIA has a nice guide here:

https://docs.nvidia.com/video-technologies/video-codec-sdk/nvenc-video-encoder-api-prog-guide/#recommended-nvenc-settings

Look for Low-latency use cases like game-streaming, video conferencing etc. in the table. Extrapolating these configurations to nvv4l2h265enc I think it would be something like:

BITRATE=4000000
FRAMERATE=30

nvv4l2h265enc control-rate=constant_bitrate bitrate=$BITRATE \
peak-bitrate=$((2*$BITRATE)) maxperf-enable=true \
iframeinterval=$((10*$FRAMERATE)) \
vbv-size=$((BITRATE/FRAMERATE)) insert-sps-pps=true \
profile=Main num-B-Frames=0 ratecontrol-enable=true \
preset-level=UltraFastPreset EnableTwopassCBR=false

This will favor speed and latency over image quality, but hopefully offloads a bit the encoders.


P.D.: I’m glad you’re finding gst-shark useful!