VPI performance for background subtraction is SLOW - need advice

video_wrangler_49 · January 20, 2022, 5:41am

So my first very simple project on the TX2 was less than satisfactory. I work with videos and one of the first pre-processing steps is background subtraction. Currently on a PC, I use CV2 MOG background subtraction (cv2.createBackgroundSubtractorMOG2), after a bit of research I decided to try VPI background subtraction (vpi.BackgroundSubtractor) on the TX2. The results were not what I expected. With CV2 MOG on the TX2, the processing of a 1.6 MB file (about 170 800x600 frames) took about 4.7 seconds. VPI CUDA subtraction took 5.7 seconds on the TX2, VPI CPU subtraction took 15 seconds. Is CV2 MOG background subtraction a much faster process than what VPI background subtraction uses? Am I not using VPI background subtraction correctly? The faster the better! Any pointers would help. Below are the two scripts I used:

VPI
import cv2
import vpi
import time

cap = cv2.VideoCapture(“path/to/video.mp4”)

videosize = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))

with vpi.Backend.CUDA:
cuda_sub = vpi.BackgroundSubtractor(videosize, vpi.Format.BGR8)

start_time=time.time()

while True:
ret, frame = cap.read()

if ret != True:
    break

mask, image = cuda_sub(vpi.asimage(frame, vpi.Format.BGR8), learnrate=0.01)

execution_time = (time.time() - start_time)

print("Execution time: " + str(execution_time))

cap.release()

CV2 MOG
import cv2
import vpi
import time

cap = cv2.VideoCapture("“path/to/video.mp4”)

subtractor = cv2.createBackgroundSubtractorMOG2(history=10, varThreshold=25, detectShadows=False)

start_time=time.time()
while True:
ret, frame = cap.read()

if ret != True:
    break

frame_mask = subtractor.apply(frame)

execution_time = (time.time() - start_time)

print("Execution time: " + str(execution_time))

cap.release()

AastaLLL · January 20, 2022, 7:02am

Hi,

You can find the benchmark data of the background subtractor below:
https://docs.nvidia.com/vpi/algo_background_subtractor.html

For TX2 with a 1920x1080 RGB8 input, it’s expected to have 35.0±0.7ms on the CUDA backend.
Could you check the sample in the document to see if anything different between the implementation?

More, please remember to maximize the performance with the VPI clock script first:
https://docs.nvidia.com/vpi/algo_performance.html#maxout_clocks

Thanks.

video_wrangler_49 · January 21, 2022, 8:05pm

Is there anyway to obtain the benchmark source code to see if there where any optimizations?
Also, regarding the VPI clock script, is there any reason why I can’t just leave the TX2 in the maxed out state all the time, if power consumption is not an issue?

AastaLLL · January 24, 2022, 7:18am

Hi,

Sorry that I just realize the timing you mentioned is the time for the whole video.
Based on the score, it takes 5.7/170 = 33ms which is close to the benchmark score.

It seems that we don’t have a comparison between OpenCV and VPI for the background subtraction algorithm.
We are going to reproduce this internally to see the behavior in our environment.
Will share more information with you later.

Thanks.

AastaLLL · January 24, 2022, 7:28am

Hi,

Confirmed that we can reproduce the performance difference.
We are checking this with our internal team.

Will share more information later.
Thanks.

video_wrangler_49 · January 24, 2022, 8:43pm

I made a mistake the 170 frame was an estimate. It appears that the actual video is 331 frames of 800x600 pixel frames. When it comes to the the benchmark score, the benchmark was for 1920x1080 frames. Since mine are much smaller and have 480K pixels vs over 2,073K pixels for the benchmark, should there be a corresponding increase in performance, from 35 ms to much less?

AastaLLL · January 28, 2022, 8:17am

Hi,

Thanks for the update.

We can also reproduce the performance issue in our environment.
To give more suggestions, we need to check more details with our internal team.

Will share more information with you once we got feedback.
Thanks.

Topic		Replies	Views
Background subtractor on Jetson Nano Jetson Nano opencv , cuda , vpi , fps	2	674	November 20, 2023
Background subtraction works perfectly on CPU backend but does nothing on CUDA using VPI 2.0 Jetson Xavier NX cuda , vpi	3	730	August 31, 2022
Performance about VPI ConvertImageFormat Jetson AGX Orin vpi	4	100	July 18, 2024
Why Jetson vic has a significant performance drop? Jetson Xavier NX vpi	8	38	December 19, 2024
Image processing speed issue with CUDA CUDA Programming and Performance	2	164	June 14, 2024
VPI very slow compared to OpenCV CPU Jetson Nano vpi	7	1824	November 10, 2021
Nprof and cProfile performance profilers reporting very different durations for GPU function invocation and GPU->CPU download Jetson Xavier NX performance	4	1388	October 18, 2021
Best remap implementation on Jetson Nano Jetson Nano opencv , cuda	16	540	August 1, 2024
Very slow performance of blur using VPI Jetson AGX Xavier jetson-inference , vpi	15	1345	October 18, 2021
How to get performance of Video processing application Jetson TK1	10	1814	July 19, 2018

VPI performance for background subtraction is SLOW - need advice

Related topics