Problems on inference using GPU with OpenCV dnn and ONNX model

arman.paknia · September 29, 2022, 12:20pm

I am trying to inference on a Jetson Xavier with OpenCV dnn. I have converted a YOLOv5m model to .onnx format . Afterwards I attempt to run inference with the model using the following codes with optimizations for GPU using CUDA AND cuDNN:

net = cv2.dnn.readNetFromONNX(yolov5m.onnx)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

However in inference on Jetson Xavier with MAXN power mode, on a 1280 X 720 resolution video, my detections are very slow (approximately 109ms per frame). Using Jetson Power GUI I see that the usage of GPU is very low (on most frames less than 20% of GPU). Also running the code without (cv2.dnn.DNN_BACKEND_CUDA)and net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA), which means defaulting to use of CPU, shows that CPU usage is very high on every frame of video. GPU usage then becomes minimal.

I have also compiled OpenCV for running with CUDA and cuDNN, using this Git.
OpenCV build information is as follows:


General configuration for OpenCV 4.6.0 =====================================
  Version control:               eb1afab-dirty

  Extra modules:
    Location (extra):            /home/jetsonjane/master/JEP/script/workspace/opencv_contrib-4.6.0/modules
    Version control (extra):     eb1afab-dirty

  Platform:
    Timestamp:                   2022-09-18T13:43:39Z
    Host:                        Linux 5.10.65-tegra aarch64
    CMake:                       3.16.3
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

  CPU/HW features:
    Baseline:                    NEON FP16

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                /usr/bin/c++  (ver 9.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    Linker flags (Debug):        -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda-11.4/lib64 -L/usr/lib/aarch64-linux-gnu
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python2 python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 alphamat cvv java julia matlab ovis sfm ts viz
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         NO

  GUI:                           GTK2
    GTK+:                        YES (ver 2.24.32)
      GThread :                  YES (ver 2.64.6)
      GtkGlExt:                  NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
    TIFF:                        /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     /usr/lib/aarch64-linux-gnu/libImath.so /usr/lib/aarch64-linux-gnu/libIlmImf.so /usr/lib/aarch64-linux-gnu/libIex.so /usr/lib/aarch64-linux-gnu/libHalf.so /usr/lib/aarch64-linux-gnu/libIlmThread.so (ver 2_3)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      YES (2.2.5)
    FFMPEG:                      YES
      avcodec:                   YES (58.54.100)
      avformat:                  YES (58.29.100)
      avutil:                    YES (56.31.100)
      swscale:                   YES (5.5.100)
      avresample:                YES (4.0.0)
    GStreamer:                   YES (1.16.3)
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.19.1)

  NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS)
    NVIDIA GPU arch:             72 87
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 8.3.2)

  OpenCL:                        YES (no extra features)
    Include path:                /home/jetsonjane/master/JEP/script/workspace/opencv-4.6.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 2:
    Interpreter:                 /usr/bin/python2.7 (ver 2.7.18)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.18)
    numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.16.5)
    install path:                lib/python2.7/dist-packages/cv2/python-2.7

  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.8.10)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython3.8.so (ver 3.8.10)
    numpy:                       /home/jetsonjane/.local/lib/python3.8/site-packages/numpy/core/include (ver 1.19.4)
    install path:                lib/python3.8/site-packages/cv2/python-3.8

  Python (for build):            /usr/bin/python2.7

  Java:                          
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /usr/local
-----------------------------------------------------------------

Also during inference with the code I can see that 1 cuda device is recognized by the system:

count = cv2.cuda.getCudaEnabledDeviceCount()
print(count)
1

Are there any steps in configuration of CUDA,cuDNN,OpenCV that I am missing?
I would be thankful for any insights or solutions.

AastaLLL · September 30, 2022, 4:47pm

Hi,

Is a Tensor an option for you?
If yes, please try it with trtexec:

$ /usr/src/tensorrt/bin/trtexec --onnx=[model]

Thanks.

system · October 26, 2022, 3:05am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem on inference using GPU with OpencCV DNN and ONNX model Jetson AGX Orin opencv , cuda , yolo , cudnn , onnx	5	460	May 22, 2024
Little problem for enabled cuda on dnn module from opencv Jetson Nano cuda	5	2595	October 15, 2021
OpenCV Cuda: No Kernel Image is Available Jetson Xavier NX opencv , cuda	8	5559	October 18, 2021
I can't use CUDA dnn on Jetson nano + python Jetson Nano opencv	11	3863	October 18, 2021
Opencv 4.6.0 with cuda support were successfully built for both python2.7 and python3.8, but cuda unavailable Jetson Xavier NX opencv , cuda , pytorch , cudnn	4	2716	November 2, 2023
OpenCV,CUDA,Python? Jetson Nano opencv	4	2027	October 14, 2021
Is OpenCV really using the GPU for detection? Jetson Nano opencv , cuda , jetson-inference	11	8469	October 15, 2021
cuDNN 8.0 of JP4.4P recognized under 7.5 version by OpenCV4.2 and 4.3 Jetson Xavier NX cudnn	6	4405	October 18, 2021
Opencv with Cuda on Nano (python3) Jetson Nano opencv , cuda	2	3447	March 28, 2022
does opencv_dnn use gpu? Jetson TX2	11	3098	October 18, 2021

Problems on inference using GPU with OpenCV dnn and ONNX model

Related topics