does opencv_dnn use gpu?

Hello, I asked here yesterday about “OpenCV4Tegra doesn’t support GPU” https://devtalk.nvidia.com/default/topic/1064925/jetson-tx2/opencv4tegra-doesn-t-support-gpu/post/5392905/#5392905, the answer was that “Opencv in jetpack has disabled gpu for a long time”

Today, my question: Is JetsonTX2 has GPU really?

I hope that the answer will be YES

I ran the code with three OpenCV versions are:

1- ros-kinetic-opencv3 …CPU
2- OpenCV4Tegra…CPU according to yesterday’s answer.
3- OpenCV with GPU, I built it from source, everything is OK but all of the versions are very very slow and there is not any observable difference among them. the numbers of the frame per second about 3- 4.6

python
Python 2.7.12 (default, Oct  8 2019, 14:14:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'3.4.1-dev'
>>>

I bought the Jetson for getting a high frame per second.

believe me that the computer is faster than Jetson.

I guess that the reasons are either gpu of my Jetson is not work for any reason or the installation of opncv was wrong

how to check the hardware and software???

my system:

NVIDIA Jetson TX2
L4T 28.2.1 [ 3.2.1 ]
Board: t186ref
Ubuntu 16.04.6 LTS
Kernel Version: 4.4.38-tegra
CUDA 9.0.252

sudo ./tegrastats 
RAM 1238/7846MB (lfb 1339x4MB) CPU [0%@345,off,off,0%@345,1%@345,1%@345] EMC_FREQ 15%@204 GR3D_FREQ 0%@140 APE 150 BCPU@45.5C MCPU@45.5C GPU@44C PLL@45.5C Tboard@40C Tdiode@43C PMIC@100C thermal@44.9C VDD_IN 1757/1816 VDD_CPU 229/202 VDD_GPU 152/152 VDD_SOC 381/392 VDD_WIFI 57/78 VDD_DDR 443/441
python
Python 2.7.12 (default, Oct  8 2019, 14:14:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> print(cv2.getBuildInformation())

General configuration for OpenCV 3.4.1-dev =====================================
  Version control:               3.4.1-9-gec0bb66-dirty

  Extra modules:
    Location (extra):            /home/nvidia/opencv_contrib/modules
    Version control (extra):     3.4.1

  Platform:
    Timestamp:                   2019-10-17T00:54:00Z
    Host:                        Linux 4.4.38-tegra aarch64
    CMake:                       3.5.1
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

  CPU/HW features:
    Baseline:                    NEON FP16
      required:                  NEON
      disabled:                  VFPV3

  C/C++:
    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 5.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      
    Linker flags (Debug):        
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          dl m pthread rt /usr/lib/aarch64-linux-gnu/libGLU.so /usr/lib/aarch64-linux-gnu/libGL.so cudart nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/usr/local/cuda-9.0/lib64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv datasets dnn dnn_objdetect dpm face features2d flann freetype fuzzy hdf hfs highgui img_hash imgcodecs imgproc java java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping photo plot python2 python3 python_bindings_generator reg rgbd saliency sfm shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab viz xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    js world
    Disabled by dependency:      -
    Unavailable:                 cnn_3dobj dnn_modern matlab ovis
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

  GUI: 
    QT:                          YES (ver 5.5.1)
      QT OpenGL support:         YES (Qt5::OpenGL 5.5.1)
    GTK+:                        NO
    OpenGL support:              YES (/usr/lib/aarch64-linux-gnu/libGLU.so /usr/lib/aarch64-linux-gnu/libGL.so)
    VTK support:                 YES (ver 6.2.0)

  Media I/O: 
    ZLib:                        /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.8)
    JPEG:                        /usr/lib/aarch64-linux-gnu/libjpeg.so (ver )
    WEBP:                        /usr/lib/aarch64-linux-gnu/libwebp.so (ver encoder: 0x0202)
    PNG:                         /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.2.54)
    TIFF:                        /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.0.6)
    JPEG 2000:                   /usr/lib/aarch64-linux-gnu/libjasper.so (ver 1.900.1)
    OpenEXR:                     /usr/lib/aarch64-linux-gnu/libImath.so /usr/lib/aarch64-linux-gnu/libIlmImf.so /usr/lib/aarch64-linux-gnu/libIex.so /usr/lib/aarch64-linux-gnu/libHalf.so /usr/lib/aarch64-linux-gnu/libIlmThread.so (ver 2.2.0)

  Video I/O:
    DC1394:                      YES (ver 2.2.4)
    FFMPEG:                      YES
      avcodec:                   YES (ver 56.60.100)
      avformat:                  YES (ver 56.40.101)
      avutil:                    YES (ver 54.31.100)
      swscale:                   YES (ver 3.1.101)
      avresample:                NO
    GStreamer:                   
      base:                      YES (ver 1.8.3)
      video:                     YES (ver 1.8.3)
      app:                       YES (ver 1.8.3)
      riff:                      YES (ver 1.8.3)
      pbutils:                   YES (ver 1.8.3)
    libv4l/libv4l2:              1.10.0 / 1.10.0
    v4l/v4l2:                    linux/videodev2.h
    gPhoto2:                     NO

  Parallel framework:            pthreads

  Trace:                         YES (built-in)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       YES (ver 3.2.92)
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.5.1)

  NVIDIA CUDA:                   YES (ver 9.0, CUFFT CUBLAS FAST_MATH)
    NVIDIA GPU arch:             62
    NVIDIA PTX archs:

  OpenCL:                        YES (no extra features)
    Include path:                /home/nvidia/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 2:
    Interpreter:                 /usr/bin/python2.7 (ver 2.7.12)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.12)
    numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
    packages path:               lib/python2.7/dist-packages

  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.5.2)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython3.5m.so (ver 3.5.2)
    numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.11.0)
    packages path:               lib/python3.5/dist-packages

  Python (for build):            /usr/bin/python2.7

  Java:                          
    ant:                         /usr/bin/ant (ver 1.9.6)
    JNI:                         /usr/lib/jvm/default-java/include /usr/lib/jvm/default-java/include/linux /usr/lib/jvm/default-java/include
    Java wrappers:               YES
    Java tests:                  YES

  Matlab:                        NO

  Install to:                    /usr/local
-----------------------------------------------------------------

please help me or any suggestions?

Yes, TX2 has nvidia gpu. Does your application have gpu related programming?

Please check how is your implementation working. For example, are you sure this is progammed with cuda?
If you application triggers gpu activity, then the percentage of “GR3D_FREQ” in tegrastats would rise.

You could verify the gpu function by using our mmapi sample which should be installed with sdkmanager.

I ran this code [url]https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/[/url]

I read some of the comments about this code so that they ran it in JetsonTX2, numbers of the frame per second were about 5

what is the reason??

Please also run “sudo nvpmodel -m 0” and “sudo jetson_clocks” before you running the app.

nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 1
nvidia@tegra-ubuntu:~$ sudo nvpmodel -q
NV Power Mode: MAXQ
1
nvidia@tegra-ubuntu:~$ sudo nvpmodel -q
NV Power Mode: MAXQ
1
nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 2
nvidia@tegra-ubuntu:~$ sudo nvpmodel -q
NV Power Mode: MAXP_CORE_ALL
2
nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0
nvidia@tegra-ubuntu:~$ sudo nvpmodel -q
NV Power Mode: MAXN
0

nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0
nvidia@tegra-ubuntu:~$ sudo jetson_clocks
sudo: jetson_clocks: command not found

Sorry, if it is L4T 28.2.1 and jetson_clocks is a shell script.

cd ~
sudo ./jetson_clocks.sh

the output: nothing

nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0
nvidia@tegra-ubuntu:~$ sudo ./jetson_clocks.sh

Please read L4T documentation ->Power Management for Jetson TX2 Series Devices to understand what is the command doing.

I read [url]Tegra Linux Driver ,

https://devtalk.nvidia.com/default/topic/1000657/jetson-tx2/script-for-maximum-clockspeeds-and-performence/

and [url]https://devtalk.nvidia.com/default/topic/1000345/jetson-tx2/two-cores-disabled-/post/5110960/#5110960[/url]

I don’t understand what is the relationship between power management
and the issues of some of the OpenCV modules and JetsonTX2 GPU ?? please, help me with the importance of this command.

I will change the title of my question from “Is JetsonTX2 has GPU really?” to “Jetson tx2 not using GPU for the opencv_dnn?”

I get some people wrote about my issue:

1- jetson tx2 not using gpu for my the opencv caffe-model?
[url]https://devtalk.nvidia.com/default/topic/1037878/jetson-tx2/jetson-tx2-not-using-gpu-for-my-the-opencv-caffe-model-/[/url]

2- Opencv dnn module inference does not use GPU on Jetson TX2 for SSD mobilenet

https://stackoverflow.com/questions/49939933/opencv-dnn-module-inference-does-not-use-gpu-on-jetson-tx2-for-ssd-mobilenet

3-on TX2 with opencv3.4 with CUDA support, only ~5fps for 400*400

nvidia@tegra-ubuntu:~/Downloads/real-time-object-detection$ python real_time_object_detection.py –prototxt MobileNetSSD_deploy.prototxt.txt –model MobileNetSSD_deploy.caffemodel
[INFO] loading model…
[INFO] starting video stream…
[INFO] elapsed time: 64.09
[INFO] approx. FPS: 5.30

https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/[/url]

did this issue (ran cv2.dnn with jetsontx2 GPU) solve?? if YES? please help me.

I used cv.duu module in my code. So, I didn’t notice any difference between all OpenCV versions as mentioned above.

  1. The power management is to enhance the TX2 performance because it would pull up cpu /gpu frequency.

Thus, the fps would be enhanced after you run jetson_clocks.sh.
You could see the cpu/gpu frequency changes in tegrastats after you run jetson_clocks.sh.

  1. I think the question should be “Does opencv_dnn use gpu?”
    Unfortunately, we don’t know the answer. If this library does not use gpu, then it would of course not have any gpu activity.

You could directly try to run cuda sample from our sdkmanager and check if your gpu can work fine or not. That is the only thing I can help check.

OpenCV has it own codebase.

Recently, dnn module supports CUDA backend.
https://github.com/opencv/opencv/pull/14827