I am trying to inference on a Jetson Xavier with OpenCV dnn. I have converted a YOLOv5m model to .onnx format . Afterwards I attempt to run inference with the model using the following codes with optimizations for GPU using CUDA AND cuDNN:
net = cv2.dnn.readNetFromONNX(yolov5m.onnx)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
However in inference on Jetson Xavier with MAXN power mode, on a 1280 X 720 resolution video, my detections are very slow (approximately 109ms per frame). Using Jetson Power GUI I see that the usage of GPU is very low (on most frames less than 20% of GPU). Also running the code without (cv2.dnn.DNN_BACKEND_CUDA)
and net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
, which means defaulting to use of CPU, shows that CPU usage is very high on every frame of video. GPU usage then becomes minimal.
I have also compiled OpenCV for running with CUDA and cuDNN, using this Git.
OpenCV build information is as follows:
General configuration for OpenCV 4.6.0 =====================================
Version control: eb1afab-dirty
Extra modules:
Location (extra): /home/jetsonjane/master/JEP/script/workspace/opencv_contrib-4.6.0/modules
Version control (extra): eb1afab-dirty
Platform:
Timestamp: 2022-09-18T13:43:39Z
Host: Linux 5.10.65-tegra aarch64
CMake: 3.16.3
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: RELEASE
CPU/HW features:
Baseline: NEON FP16
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 9.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
Linker flags (Debug): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda-11.4/lib64 -L/usr/lib/aarch64-linux-gnu
3rdparty dependencies:
OpenCV modules:
To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python2 python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: alphamat cvv java julia matlab ovis sfm ts viz
Applications: apps
Documentation: NO
Non-free algorithms: NO
GUI: GTK2
GTK+: YES (ver 2.24.32)
GThread : YES (ver 2.64.6)
GtkGlExt: NO
VTK support: NO
Media I/O:
ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
WEBP: build (ver encoder: 0x020f)
PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
JPEG 2000: build (ver 2.4.0)
OpenEXR: /usr/lib/aarch64-linux-gnu/libImath.so /usr/lib/aarch64-linux-gnu/libIlmImf.so /usr/lib/aarch64-linux-gnu/libIex.so /usr/lib/aarch64-linux-gnu/libHalf.so /usr/lib/aarch64-linux-gnu/libIlmThread.so (ver 2_3)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
DC1394: YES (2.2.5)
FFMPEG: YES
avcodec: YES (58.54.100)
avformat: YES (58.29.100)
avutil: YES (56.31.100)
swscale: YES (5.5.100)
avresample: YES (4.0.0)
GStreamer: YES (1.16.3)
v4l/v4l2: YES (linux/videodev2.h)
Parallel framework: pthreads
Trace: YES (with Intel ITT)
Other third-party libraries:
Lapack: NO
Eigen: NO
Custom HAL: YES (carotene (ver 0.0.1))
Protobuf: build (3.19.1)
NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS)
NVIDIA GPU arch: 72 87
NVIDIA PTX archs:
cuDNN: YES (ver 8.3.2)
OpenCL: YES (no extra features)
Include path: /home/jetsonjane/master/JEP/script/workspace/opencv-4.6.0/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python 2:
Interpreter: /usr/bin/python2.7 (ver 2.7.18)
Libraries: /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.18)
numpy: /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.16.5)
install path: lib/python2.7/dist-packages/cv2/python-2.7
Python 3:
Interpreter: /usr/bin/python3 (ver 3.8.10)
Libraries: /usr/lib/aarch64-linux-gnu/libpython3.8.so (ver 3.8.10)
numpy: /home/jetsonjane/.local/lib/python3.8/site-packages/numpy/core/include (ver 1.19.4)
install path: lib/python3.8/site-packages/cv2/python-3.8
Python (for build): /usr/bin/python2.7
Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: /usr/local
-----------------------------------------------------------------
Also during inference with the code I can see that 1 cuda device is recognized by the system:
count = cv2.cuda.getCudaEnabledDeviceCount()
print(count)
1
Are there any steps in configuration of CUDA,cuDNN,OpenCV that I am missing?
I would be thankful for any insights or solutions.