I’m trying to implement simple object detection (OpenCV Haar) and because of jetson tx2 platform ability to use CUDA for such kind of processing, to use OpenCV cuda implementation looks like a right way to do. Howether after i have implemented it (both CPU and GPU) i’ve noticed no sufficient performance difference between this approaches (about 200ms for CPU and GPU).
INPUT camera captured image (1280x720)
jetson_clocks and nvpmodel -m 0 are set
OpenCV 3.4.0 build with CUDA support
release build (as i’ve already noticed it is very important for cuda performance)
CascadeClassifier instances created according example code (provided b OpenCV distrib): g_pFaceClassifier = cv::CascadeClassifier(HAAR_CASCADE_FILENAME); // CPU based g_pFaceClassifier = cv::cuda::CascadeClassifier::create(HAAR_CASCADE_FILENAME); //GPU based
HAAR_CASCADE_FILENAME is correct in both ways (different files are used).
detection calls: g_pFaceClassifier.detectMultiScale(gray,found,1.1, 2, 0 | cv::CASCADE_SCALE_IMAGE, cv::Size(32, 32)); //CPU based g_pFaceClassifier->detectMultiScale(gpuImg,outImg); //GPU based g_pFaceClassifier->convert(outImg, found); //GPU based requires extra func call to convert output
and as the result it’s both costs about 200ms to process one frame
//i’ve check the CPU usage with htop utility, for CPU based it was 100% usage for all 6 cores (nvpmodel -m 0) and for GPU based it was about 15% for some cores (except the one handles OS and application routine calls)
Does anyone used/tryed OpenCV cuda based CascadeClassifier implemetation for object detection? i would be much appreciate for comment with performance specs (probably i did something wrong).
PS. I’ve also checked sample (cuda::HOG) provided by OpenCV distrib (just need to modify VideoCapture pipeline setup to get stream from onboard camera). And it was about 10FPS for 1280x720 CUDA (and about 4FPS for CPU mode). For me it’s a less than i expected.
Tnx for anyone who can provide any information/advice.