Nvvidconv is slow

I’m running a gstreamer pipeline on a Jetson AGX Xavier devkit.
The Jetson is connected to a camera with v4l2src interface. My problem is that most of the latency in the pipeline comes from nvvidconv which is unexpected. The pipeline is the following:

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime"  gst-launch-1.0 rtpbin name=rtpbin v4l2src num-buffers=200 name=src_video0   device=/dev/video0 ! vid
eo/x-raw,format=YUY2,width=1920,height=1080,framerate=60/1 ! nvvidconv ! video/x-raw\(memory:NVMM\), format=\(string\)I420 ! nvv4l2h265enc bitrate=4000000 insert-sps-pps=1 maxperf-enable=true name=video_enc0 ! video/x-h265, stream-format=\(string\)byte-stream ! rtph265pay name=rtp_payloader0 ! application/x-rtp,media=\(string\)video,clock-rate=\(int\)90000,payload=\(int\)96,ssrc=\(uint\)11234,encoding-name=\(string\)H265 ! rtpbin.send_rtp_sink_0 rtpbin.send_rtp_src_0 ! udpsink   port=5000  host=127.0.0.1 rtpbin.send_rtcp_src_0 ! udpsink   port=5001  host=127.0.0.1  sync=false async=false udpsrc port=5003 ! rtpbin.recv_rtcp_sink_0

I’m measuring the processing time of each element with GstShark and get the following plot:


As you can see the encoder takes around 10 ms and nvvidconv takes 40 ms which is unexpected.
I tried a couple things, but they didn’t help:

  1. GStreamer pipeline is slow Set nvpmodel -m 2 and run jetson_clocks, set io-mode=2 for v4l2src
  2. Nvcamerasrc, tee, and nvvidconv slow - #5 by DaneLLL Lock nvenc clock and set rate to max_rate.

sudo tegrastats shows the following:

RAM 2346/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [21%@1156,85%@1156,12%@1156,9%@1156,off,off,off,off] EMC_FREQ 2%@1331 GR3D_FREQ 0%@318 NVENC 1075 NVENC1 1075 VIC_FREQ 1%@115 APE 150 MTS fg 0% bg 4% AO@25.5C GPU@25C Tdiode@27.75C PMIC@100C AUX@26C CPU@26C thermal@25.7C Tboard@26C GPU 467/415 CPU 622/622 SOC 3425/3425 CV 0/0 VDDRQ 467/467 SYS5V 1916/1916
RAM 2346/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [17%@1155,71%@1155,20%@1154,9%@1155,off,off,off,off] EMC_FREQ 2%@1331 GR3D_FREQ 0%@318 NVENC 1075 NVENC1 1075 VIC_FREQ 88%@115 APE 150 MTS fg 0% bg 4% AO@25.5C GPU@25C Tdiode@28C PMIC@100C AUX@26C CPU@26C thermal@25.7C Tboard@26C GPU 311/389 CPU 622/622 SOC 3425/3425 CV 0/0 VDDRQ 467/467 SYS5V 1916/1916
RAM 2346/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [12%@1156,72%@1156,19%@1156,10%@1157,off,off,off,off] EMC_FREQ 2%@1331 GR3D_FREQ 0%@318 NVENC 1075 NVENC1 1075 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 3% AO@25.5C GPU@25C Tdiode@27.75C PMIC@100C AUX@26C CPU@26C thermal@25.55C Tboard@26C GPU 311/373 CPU 622/622 SOC 3425/3425 CV 0/0 VDDRQ 467/467 SYS5V 1916/1916
RAM 2346/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [11%@1154,81%@1154,13%@1154,4%@1153,off,off,off,off] EMC_FREQ 2%@1331 GR3D_FREQ 0%@318 NVENC 1075 NVENC1 1075 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 4% AO@26C GPU@25C Tdiode@27.75C PMIC@100C AUX@26C CPU@25.5C thermal@25.7C Tboard@26C GPU 467/389 CPU 622/622 SOC 3425/3425 CV 0/0 VDDRQ 467/467 SYS5V 1916/1916
RAM 2323/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [8%@1156,14%@1155,3%@1155,4%@1155,off,off,off,off] EMC_FREQ 1%@1331 GR3D_FREQ 0%@318 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 3% AO@25.5C GPU@25C Tdiode@28C PMIC@100C AUX@26C CPU@25.5C thermal@25.55C Tboard@26C GPU 467/400 CPU 467/599 SOC 3115/3380 CV 0/0 VDDRQ 311/444 SYS5V 1837/1904
RAM 2323/31929MB (lfb 4542x4MB) SWAP 0/15965MB (cached 0MB) CPU [1%@1156,4%@1156,0%@1156,0%@1155,off,off,off,off] EMC_FREQ 0%@1331 GR3D_FREQ 0%@318 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 1% AO@25.5C GPU@25C Tdiode@27.75C PMIC@100C AUX@26C CPU@25.5C thermal@25.55C Tboard@26C GPU 467/408 CPU 467/583 SOC 2959/3328 CV 0/0 VDDRQ 311/428 SYS5V 1840/1896

We can see that the NVENC clock is set to ~1 GHz.

Please advise what else might cause nvvidconv to be slow. Thank you!

Update: Even the following simpler pipeline produces ~40 ms latency for nvvidconv

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime"  gst-launch-1.0 v4l2src num-buffers=200 name=src_video0   device=/dev/video0 ! video/x-raw,format=YUY2,width=1920,height=1080,framerate=60/1 ! nvvidconv ! video/x-raw\(memory:NVMM\),format=YUY2 ! fakesink

Hi,
Please execute the steps to run hardware converter at maximum clock and check again:
Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL

Thank you!
I tried that, now it is doing around ~40 fps.
If I run the following command, I get 60 fps:

gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,format=UYVY,width=1920,height=1080 ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

If I run the following command, I get 40 fps:

gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,format=UYVY,width=1920,height=1080 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

This also yields 40 fps (keeping the color format UYVY instead of converting to NV12):

gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,format=UYVY,width=1920,height=1080 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=UYVY' ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

Is there anything else I could try to make this run at 60 fps?

I also tried checking a pipeline with videotestsrc, and that has no problem producing 60 fps:

gst-launch-1.0 videotestsrc ! video/x-raw,format=UYVY,width=1920,height=1080,framerate=60/1 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

Hi,
Please set sync=0 to fakesink and check again:

... ! fpsdisplaysink text-overlay=false video-sink=fakesink sync=0

FYI: The solution to the above problem was using nvv4l2camerasrc instead of v4l2src. After this change, the pipeline can run at 60 fps no problem.

So the final pipeline would be:

gst-launch-1.0 nvv4l2camerasrc device=/dev/video0 ! video/x-raw(memory:NVMM),format=UYVY,width=1920,height=1080 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

Thank you so much for your help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.