Is OpenCV really using the GPU for detection?

andrespereira9445 · March 22, 2021, 11:52pm

Hi developers!
Recently I noticed something strange when I am running a python script for inference with my custom YOLO v4-tiny model with the cv2.dnn.readNetFromDarknet(). The program works just fine but at low FPS, as I see in other videos in Youtube, this is normal while detecting objects with YOLO.
At the time I checked the JTOP monitor(exclusive developed app for the Jetson Nano) while running my program it gave me the following results:

From what I think is happening is that the program is only using the four cores of the CPU instead of the GPU.
Just take a look in the GPU tab of JTOP:

It seems that CUDA cores are barely working. And in the CPU tab is a whole different story, all cores working with 95 % in average:

Two thing may be happening: Or JTOP is not trustworthy or GPU is lazing around letting the CPU take all the work. I have compiled OpenCV to work with CUDA:

The BIS parameter is just the blob image size that I can change in real time from 32x13=416 to 32
I’ve compiled my OpenCV with this guide: Install OpenCV 4.5 on Jetson Nano - Q-engineering
Is there something else that I need to install or write down in my python script in order to use OpenCV CUDA accelerated or I am already using it?
Thank you in advance for any reply

AastaLLL · March 23, 2021, 5:46am

Hi,

To use CUDA version DNN module, have you built OpenCV with the following configuration?

$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
    ... \
    -D WITH_CUDA=ON \
    -D WITH_CUDNN=ON \
    -D OPENCV_DNN_CUDA=ON \
    -D CUDA_ARCH_BIN=5.3 \
    -D WITH_CUBLAS=1 \
    ...

Thanks

andrespereira9445 · March 23, 2021, 12:38pm

Hello AastaLLL!
I used this configuration for cmake

$ cmake -D CMAKE_BUILD_TYPE=RELEASE
-D CMAKE_INSTALL_PREFIX=/usr
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules
-D EIGEN_INCLUDE_PATH=/usr/include/eigen3
-D WITH_OPENCL=OFF
-D WITH_CUDA=ON
-D CUDA_ARCH_BIN=5.3
-D CUDA_ARCH_PTX=“”
-D WITH_CUDNN=ON
-D WITH_CUBLAS=ON
-D ENABLE_FAST_MATH=ON
-D CUDA_FAST_MATH=ON
-D OPENCV_DNN_CUDA=ON
-D ENABLE_NEON=ON
-D WITH_QT=ON
-D WITH_OPENMP=ON
-D WITH_OPENGL=ON
-D BUILD_TIFF=ON
-D WITH_FFMPEG=ON
-D WITH_GSTREAMER=ON
-D WITH_TBB=ON
-D BUILD_TBB=ON
-D BUILD_TESTS=OFF
-D WITH_EIGEN=ON
-D WITH_V4L=ON
-D WITH_LIBV4L=ON
-D OPENCV_ENABLE_NONFREE=ON
-D INSTALL_C_EXAMPLES=OFF
-D INSTALL_PYTHON_EXAMPLES=OFF
-D BUILD_NEW_PYTHON_SUPPORT=ON
-D BUILD_opencv_python3=TRUE
-D OPENCV_GENERATE_PKGCONFIG=ON
-D BUILD_EXAMPLES=OFF …

AastaLLL · March 25, 2021, 5:33am

Hi,

It seems that your GPU utilization is 30%.
Do you run any other GPU application at the same time?

If not, OpenCV may still use GPU for inference but not well-optimized.

Thanks.

andrespereira9445 · March 29, 2021, 3:56pm

I am only running only one python script or proccess at the time. So this means that the dnn module of OpenCV is not optimized to work with Jetson Nano GPU or is JTOP showing inaccurate information about the GPU work proccess

AastaLLL · March 30, 2021, 9:16am

Hi,

It’s more likely the OpenCV optimized issue.

To confirm this, would you mind to evaluate the app with our profiler?
It can show you if any GPU API is used directly.

$ sudo /usr/local/cuda-10.2/bin/nvprof python3 [app.py]

Thanks.

andrespereira9445 · March 30, 2021, 2:06pm

Good morning

Few minutes ago I’ve placed my python file in /home/my_user and when I run that command I get:

[ WARN:0] global /home/redeye/opencv/modules/videoio/src/cap_gstreamer.cpp (961) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to ‘/tmp/runtime-root’

Actual Status

Detection:No object to track…
Current frame analysis took 0.939 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.748 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.699 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.685 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.695 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.709 seconds
Command:OOOOOSOO

Actual Status

Detection:No object to track…
Current frame analysis took 0.695 seconds
Command:OOOOOSOO

Traceback (most recent call last):
File “Test_Platform-2.py”, line 407, in
serialArduino.write(order.encode(‘ascii’))
NameError: name ‘serialArduino’ is not defined
======== Warning: No CUDA application was profiled, exiting
======== Error: Application returned non-zero code 1

Only the last two lines are the output from nvprof profiler
So this means that my script is not really using the GPU for detection?

AastaLLL · April 1, 2021, 2:18am

Hi,

Thanks for your testing.

We are preparing OpenCV environment to check it further.
Will share more information with you later.

AastaLLL · April 13, 2021, 4:03pm

Hi,

Test with below command.
We confirmed that the example doesn’t use GPU.

$ python3 object_detection.py --config=yolov4-tiny.cfg --model=yolov4-tiny.weights --classes=../data/dnn/object_detection_classes_coco.txt --width=416 --height=416 --scale=0.00392 --input=/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4 --rgb

It seems to vary from the model type or the model format you used.
For object detection, the supported target platform doesn’t have GPU:

github.com

opencv/opencv/blob/master/samples/dnn/object_detection.py#L39


      
                                   'argument. This way an additional .pbtxt file with TensorFlow graph will be created.')
          parser.add_argument('--framework', choices=['caffe', 'tensorflow', 'torch', 'darknet', 'dldt'],
                              help='Optional name of an origin framework of the model. '
                                   'Detect it automatically if it does not set.')
          parser.add_argument('--thr', type=float, default=0.5, help='Confidence threshold')
          parser.add_argument('--nms', type=float, default=0.4, help='Non-maximum suppression threshold')
          parser.add_argument('--backend', choices=backends, default=cv.dnn.DNN_BACKEND_DEFAULT, type=int,
                              help="Choose one of computation backends: "
                                   "%d: automatically (by default), "
                                   "%d: Halide language (http://halide-lang.org/), "
                                   "%d: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
                                   "%d: OpenCV implementation, "
                                   "%d: VKCOM, "
                                   "%d: CUDA" % backends)
          parser.add_argument('--target', choices=targets, default=cv.dnn.DNN_TARGET_CPU, type=int,
                              help='Choose one of target computation devices: '
                                   '%d: CPU target (by default), '
                                   '%d: OpenCL, '
                                   '%d: OpenCL fp16 (half-float precision), '
                                   '%d: NCS2 VPU, '
                                   '%d: HDDL VPU, '

Thanks

andrespereira9445 · April 14, 2021, 7:43pm

Hi there.
Few moments ago I’ve finally discovered my solution. Now my script is using CUDA cores!
Before the modification I had 1.25 fps average with 416x416 input blob feed and now is running at 6.5 fps at the same input size. Big improvement considering that is an edge device.
In order to truly use CUDA cores is needed to add the following two lines after
net = cv2.dnn.readNetFromDarknet(cfgPath,weightsPath)

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

I did not know that this two lines where needed to properly work with GPU
This is the page that pointed me in right direction: How to use OpenCV DNN Module with NVIDIA GPUs on Linux

And this are my current GPU graphs in JTOP app:

And CPU not taking all the math like before:

Even so, many thanks to @AastaLLL for all the replies and taking your time to solve my issue.

Something I do not understand is why the learnopencv blog uses argparse method in their python script to activate CUDA, can someone explain that?
This are fragments of the script:

import argparse

parser = argparse.ArgumentParser(description=‘Run keypoint detection’)
parser.add_argument(“–device”, default=“cpu”, help=“Device to inference on”)

net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)

if args.device == “cpu”:
net.setPreferableBackend(cv2.dnn.DNN_TARGET_CPU)
print(“Using CPU device”)
elif args.device == “gpu”:
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
print(“Using GPU device”)

Thanks

AastaLLL · April 20, 2021, 7:48am

Hi,

Thanks for sharing this information with us.

The script is just trying to support both CPU and GPU mode.
You can launch the script with python3 [app.py] --device cpu to deploy a model on CPU.
And python3 [app.py] --device gpu for GPU case.

Thanks.

Topic		Replies	Views
Little problem for enabled cuda on dnn module from opencv Jetson Nano cuda	5	2538	October 15, 2021
Very poor Performance with with NVIDIA Jetson Nano 2GB in Face Recognition Jetson Nano python	7	3309	March 28, 2022
does opencv_dnn use gpu? Jetson TX2	11	3095	October 18, 2021
jetson tx2 not using gpu for my the opencv caffe-model Jetson TX2	7	937	October 18, 2019
Opencv Face Detection Poor Performance with jetson nano Jetson Nano opencv	51	14177	October 14, 2021
Problems on inference using GPU with OpenCV dnn and ONNX model Jetson AGX Xavier opencv , cuda , yolo , onnx	2	2190	September 30, 2022
Enable GPU and CUDA in YoloV4 code Jetson Nano opencv , cuda , yolo , gpu	4	1358	July 6, 2023
I can't use CUDA dnn on Jetson nano + python Jetson Nano opencv	11	3855	October 18, 2021
openCv + detectNet in python Jetson Nano camera , python	11	2337	October 15, 2021
OpenCV Cuda with low performance on Jetson Nx Jetson Xavier NX opencv , cuda	4	2201	October 18, 2021

Is OpenCV really using the GPU for detection?

Actual Status

Actual Status

Actual Status

Actual Status

Actual Status

Actual Status

Actual Status

Related topics