Nvcamerasrc, tee, and nvvidconv slow

Hi,
I have a system where it seems the performance is hitting some bottleneck and the frame rate drops.

We currently have 28.1 installed and cannot update at the moment for various reasons.

I can reproduce similar behaviour to my issue by creating a pipeline that has a single nvcamerasrc, followed by a nvvidconv to CPU memory. I then tee this, and on each of the tees, perform another nvvidconv back to NVMM. With one or two branches from the tee, I can get full frame rate. If I add a few more branches, the frame rate slows down.

The odd part is that if I use a multifilesrc and load buffers from disk, I see no such performance slow down.

Is there some sort of interaction between nvcamerasrc and nvvidconv that would be a limitation?

Here are the two pipelines for reference:

Fast pipeline (180 fps): (load 1080p data, nvvidconv to NVMM, then back to CPU, then tee)

gst-launch-1.0 -v multifilesrc loop=true location=/tmp/test.%03d.raw caps=“video/x-raw,format=I420,width=1920,height=1080,framerate=24/1” !
nvvidconv !
“video/x-raw(memory:NVMM),width=128,height=128” !
nvvidconv !
“video/x-raw(memory),width=128,height=128” !
queue !
tee name=s !
queue ! nvvidconv interpolation-method=0 ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false

Slow pipeline (20fps… sensor generates 30fps):

gst-launch-1.0 -v nvcamerasrc sensor-id=4 ! “video/x-raw(memory:NVMM),width=128,height=128” !
nvvidconv !
“video/x-raw(memory),width=128,height=128” !
queue !
tee name=s !
queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false
s. ! queue ! nvvidconv ! “video/x-raw(memory:NVMM),width=1920,height=1080” ! fpsdisplaysink video-sink=fakesink text-overlay=false

Hi,
Please execute sudo jetson_clocks to run at max performance. You can check system loading by executing sudo tegrastats. From your comment, it looks like the bottleneck may lie in copying data from CPU buffer to NVMM buffer.

Hi thanks for the reply. A couple notes:

  1. I posted this in the wrong forum. It should be TX2
  2. I did try running jetson_clocks.sh and it had no bearing.
  3. I tried to rule out a memory copy issue by my multifilesync pipeline. There the first halve does a file load (1920x1080 to CPU), then nvvidconv to NVMM, then another nvvidconv to CPU, then tees it off to the multiple nvvidconv calls each doing another copy to NVMM.

So why can the multifilesrc pipeline run WAY faster than the CSI capture pipeline?

Side note, I noticed a small error the multifilesrc pipeline I posted. I was modifying it during experimentation, here’s another version of the “source” part I used to ensure it’s loading 1080p, and doing that as a memcpy

Hi,
In using multifilesrc, it reads the files continuously. But in using nvcamerasrc, the frames are generated in a period. The difference is like to run videotestsrc without/with is-live=1. So for multifilesrc, you read a frame, duplicate to 11 frames and read next frame right away. For nvcamerasrc, you get one frame every 33ms(30fps) and duplicate to 11 frames.

I understand that multifilesrc will produce frames way faster. My concern is that using nvcamerasrc was only running at 20fps but the sensor is producing data at 30fps. So it can’t even make realtime 30fps when using nvcamerasrc, but multifilesrc shows the system should have enough resources to run at 4X 30fps

I noticed while running tegrastats that “MSENC” was decreasing when I see my frame rate drop.
Is there a way of forcing that to be at max?

When I force the MSENC clock to max the problem disappears.

echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvenc/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvenc/state
cat /sys/kernel/debug/bpmp/debug/clk/nvenc/max_rate > /sys/kernel/debug/bpmp/debug/clk/nvenc/rate

Hi,
Your method of setting device nodes should be fine. Or you can apply this patch and rebuild/replace libgstomx.so:
No encoder perfomance improvement before/after jetson_clocks.sh - #5 by DaneLLL