nvvidconv x-raw(memory:NVMM) to x-raw conversion performance

kes25c · June 24, 2019, 11:39pm

I’m curious if a performance bottleneck we are seeing with nvvidconv is expected or not.

We have six cameras and are running six gstreamer pipelines in our application that look like:

nvcamerasrc sensor-id=X fpsRange="30 30" ! "video/x-raw(memory:NVMM), width=(int)1280, height=(int)1080, format=(string)I420, framerate=(fraction)30/1" ! nvvidconv ! "video/x-raw, width=(int)1280, height=(int)1080, format=(string)I420, framerate=(fraction)30/1" ! appsink

With all six cameras running, we are only getting ~20-21 fps. That’s doing nothing but pulling samples in the appsink callback. However, with a pipeline that stays in NVMM memory (for example recording with omxh264enc), all six cameras are able to stream at 30fps.

Interestingly, if I have nvvidconv resize the images to half size:

nvcamerasrc sensor-id=X fpsRange="30 30" ! "video/x-raw(memory:NVMM), width=(int)1280, height=(int)1080, format=(string)I420, framerate=(fraction)30/1" ! nvvidconv ! "video/x-raw, width=(int)640, height=(int)540, format=(string)I420, framerate=(fraction)30/1" ! appsink

Then our application receives frames at 30fps in the appsink callbacks. Running only 4 cameras instead of all 6 also results in 30fps (at full res) in the appsink callbacks. This seems like it’s an issue with the nvmm to cpu buffer conversion overhead.

Should converting 6 1280x1080 streams at 30fps be possible with nvvidconv? If not, is there an alternative method we should try?

DaneLLL · June 27, 2019, 3:48am

Hi,
You should get better performance to pull video/x-raw(memory:NVMM) buffers in appsink. Please check the sample at
[url]https://devtalk.nvidia.com/default/topic/1037450/jetson-tx2/use-gstreamer-or-tegra_multimedia_api-to-decode-video-would-be-more-efficient-and-increase-throughpu-/post/5270860/#5270860[/url]
Please note that calling NvReleaseFd() is not required on r32.1

kes25c · June 27, 2019, 6:59am

Hi DaneLLL. Thanks for the reply.

To clarify a bit, our application is not currently using the multimedia api. Hence the need to have nvvidconv in the pipeline to do the conversion to x-raw so that we can access cpu buffers in the callback. If I modify the gstreamer pipeline so that x-raw(memory:NVMM) is passed to the appsink we do indeed get 30fps for all six cameras at full resolution. However, we can’t read those samples since they are not cpu accessible.

Based on your feedback, it seems that we should use the multimedia api for better performance. From looking at the sample code and documentation, we could use either:

ExtractFdFromNvBuffer
NvBufferMemMap
NvBufferMemSyncForCpu
memcpy_to_our_cpu_buffer
NvBufferMemUnMap

ExtractFdFromNvBuffer
NvBuffer2Raw

I’ll test this tomorrow.

Do you know how the above compares to what nvvidconv does internally? I guess the best option would be to use NvBufferMemMap and then use that pointer instead of adding the additional copy to our own buffer. Is it safe to keep that buffer mapped for a long period of time?

DaneLLL · June 27, 2019, 8:17am

Hi,
Using NvBuffer, which is DMA buffer, can get better performance. If you must execute memcpy_to_our_cpu_buffer to have frames in CPU buffer, your original pipeline is the solution and one more thing you can try is to run ‘sudo jetson_clocks.sh’

kes25c · June 27, 2019, 7:11pm

We may not need to copy the data to our own buffer. If I use NvBufferMemMap on a sample in the appsink callback is it safe to keep that mapped for a long period of time? or does it need to be unmapped before returning from the callback? In other words, if we don’t unmap those buffers will it cause a problem for nvcamerasrc or other parts of the pipeline? Would it be better to create new buffers with createNvBuffer and copy into those for long term storage? Thanks for your help.

DaneLLL · June 28, 2019, 1:38am

Hi,
It should be fine if you keep NvBufferMemMap status. After CPU processing, you have to call NvBufferMemSyncForDevice() or the buffer can be off-synchronized.

We would suggest allocate local buffers in appsink through NvBufferCreate(), copy nvvidconv buffers through NvBufferTransform(), and return nvvidconv buffers directly.

kes25c · June 28, 2019, 5:53pm

Did some testing, and found a few interesting things:

Passing maxperf=true to nvarguscamerasrc (32.1) makes a huge difference! What exactly does this option do? The only reference I can find to it in the user guide just says it will increase power consumption.

I was testing 5 different gstreamer pipelines:

A = ExtractFdFromNvBuffer → NvBufferMemMap → NvBufferMemSyncForCpu → memcpy_to_our_cpu_buffer
B = memcpy_to_our_cpu_buffer (data already in cpu buffer)

0.) nvarguscamerasrc (1280x1080, nvmm, 30fps) → appsink (A)
1.) nvarguscamerasrc (640x540, nvmm, 30fps) → appsink (A)
2.) nvarguscamerasrc (1280x1080, nvmm, 30fps) → nvvidconv (1280x1080, x-raw) → appsink (B)
3.) nvarguscamerasrc (1280x1080, nvmm, 30fps) → nvvidconv (640x540, x-raw) → appsink (B)
4.) nvarguscamerasrc (640x540, nvmm, 30fps) → nvvidconv (640x540, x-raw) → appsink (B)

With maxperf=false and six cameras, the actual fps is:

0 → 25fps
1 → 30fps
2 → 15fps
3 → 15fps
4 → 30fps

With maxperf=true all the pipelines achieve 30fps. Why does this default to false? Was there an equivalent to maxperf for nvcamerasrc?

NvBuffer2Raw is much slower (3-4x) then doing ExtractFdFromNvBuffer → NvBufferMemMap → NvBufferMemSyncForCpu → memcpy. What exactly is the use case for NvBuffer2Raw?

ShaneCCC · July 3, 2019, 2:40am

The maxperf boost the vi/csi/isp clocks that help for multiple use case.
For your case that could be max_pixel_rate is too small or num_csi_lanes is not correct.

num_csi_lanes = <2>;
 		max_lane_speed = <1500000>;
		min_bits_per_pixel = <10>;
 		vi_peak_byte_per_pixel = <2>;
 		vi_bw_margin_pct = <25>;
 		max_pixel_rate = <160000>;
 		isp_peak_byte_per_pixel = <5>;
 		isp_bw_margin_pct = <25>;

kes25c · July 19, 2019, 5:30pm

We’ve been transitioning to using libargus instead of gstreamer, and I was curious if there is a similar setting (or settings) to maxperf? In my tests so far, using libargus achieves 30fps at full res with all six cameras. Are the csi/isp clocks boosted by default when using libargus? How exactly is nvarguscamerasrc controlling them? via a public api?

ShaneCCC · July 22, 2019, 4:20am

The ISO bandwidth will calculate depend on the value of #8

kes25c · July 22, 2019, 7:58pm

Thanks for the reply. If my understanding is correct, the clock rates are calculated from the various configuration parameters listed above (num_csi_lanes, max_pixel_rate, etc…). The maxperf option just overrides that and boosts the vi/isp/csi clocks to their max rates? Basically this:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo {$max_rate} > /sys/kernel/debug/bpmp/debug/clk/vi/rate
echo {$max_rate} > /sys/kernel/debug/bpmp/debug/clk/isp/rate
echo {$max_rate} > /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate

So if full frame rate is achieved only with maxperf=true, it indicates some error in the configuration values in the dtb? Sorry, I’m not a hardware guy.

ShaneCCC · July 23, 2019, 2:20am

You can cat those value back for maxperf=true and maxperf=false to confirm it.

kes25c · July 25, 2019, 4:03pm

I looked into this, and it’s actually the vic clk that’s changing when I set maxperf=true with nvarguscamerasrc. The isp/vi/nvcsi clks stay the same. The vic clk goes from 115200000 to 1024000000.

When using libargus, the vic clk is always 1024000000. isp/vi/nvcsi clks are the same as when using gstreamer + nvarguscamerasrc. How is the vic clk rate determined when using gstreamer + nvarguscamerasrc?

Topic		Replies	Views
Nvcamerasrc, tee, and nvvidconv slow Jetson TX1 camera , gstreamer	8	1180	October 18, 2021
An important bug about nvargus and tee /queue when captured by using multiple sensors ？ Jetson Xavier NX gstreamer	28	2556	December 8, 2021
VI/ISP throughput limit Jetson Xavier NX camera	10	1666	August 25, 2022
GStreamer nvvidconv performance/cost (UYVY to NV12 for nvv4l2h264enc) Jetson Xavier NX camera , gstreamer , nvbugs	13	1634	December 7, 2022
Limit of multiple nvv4l2h264enc & nvvidconv instances at the same time Jetson Nano gstreamer , encoder , video	7	1324	April 22, 2022
Performance optimization help Jetson TX2	19	1083	October 18, 2021
Explore the gstreamer pipeline with opencv Jetson Nano opencv	16	3630	October 18, 2021
TX1 gstreamer nvvidconv will not pass out of NVMM memory Jetson TX1	8	7538	October 18, 2021
Open V4L2SRC YV12 60FPS camera on TX2 with opencv Jetson TX2	6	1007	October 18, 2021
A bug about memory leak of the nvvideoconvert plugin by using GPU DeepStream SDK gstreamer	18	1018	December 29, 2022

nvvidconv x-raw(memory:NVMM) to x-raw conversion performance

Related topics