I’m seeing a large discrepancy in performance between opencv4tegra 2.4.13 (Jetpack 3.1) and opencv4tegra 3.3.1 (Jetpack 3.2). Both of these were installed using jetpack, and the same test code is running on each.
TL;DR:
10000 calls to cv2.distanceTransform() with opencv4tegra 2.4.13 takes 6.555 seconds, while with version 3.3.1 it takes 10.269 seconds:
<b># with opencv 2.4.13</b>
$ python test_distance_transform.py
Mon Jan 29 18:31:18 2018 prof
10004 function calls in 6.555 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 6.525 0.001 6.525 0.001 {cv2.distanceTransform}
1 0.030 0.030 6.555 6.555 test_distance_transform.py:7(test)
...
<b># with opencv 3.3.1</b>
$ python test_distance_transform.py
Mon Jan 29 21:34:06 2018 prof
10004 function calls in 10.269 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 10.238 0.001 10.238 0.001 {distanceTransform}
1 0.031 0.031 10.269 10.269 test_distance_transform.py:6(test)
...
The actual test is using an internal profiler, but you will see similar results simply by running this script:
import cv2
import numpy as np
def test():
blah = np.random.randint(low=0, high=255, size=(270,270), dtype=np.uint8)
for i in xrange(10000):
cv2.distanceTransform(blah, cv2.DIST_L2, cv2.DIST_MASK_PRECISE)
test()
Any idea on how to make this faster? When profiling my target application, calls to Opencv are by far the slowest leg.
Thanks for the reply! I should have added that I have also attempted to build opencv 3.3.0 myself with cuda and gstreamer support (but against Cuda 8, not 9!), and saw no performance gain.
I supposed I’ll have to try some of the additional flags from your third link. Any idea when 3.2 should be available proper?
Opencv 3.3.0 built from source on tegra:
$ python test_distance_transform.py
Mon Jan 29 18:25:33 2018 prof
10004 function calls in 10.182 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10000 10.150 0.001 10.150 0.001 {distanceTransform}
1 0.032 0.032 10.182 10.182 test_distance_transform.py:6(test)
And here is that build configuration:
=== Opencv built from source for tegra ===
General configuration for OpenCV 3.3.0 =====================================
Version control: 3.3.0
Platform:
Timestamp: 2018-01-29T04:37:12Z
Host: Linux 4.4.38-tegra aarch64
CMake: 3.5.1
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: Release
CPU/HW features:
Baseline: NEON FP16
required: NEON
disabled: VFPV3
C/C++:
Built as dynamic libs?: YES
C++ Compiler: /usr/bin/c++ (ver 5.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release):
Linker flags (Debug):
ccache: NO
Precompiled headers: NO
Extra dependencies: gtk-x11-2.0 gdk-x11-2.0 pangocairo-1.0 atk-1.0 cairo gdk_pixbuf-2.0 gio-2.0 pangoft2-1.0 pango-1.0 fontconfig freetype gthread-2.0 /usr/lib/aarch64-linux-gnu/libpng.so /usr/lib/aarch64-linux-gnu/libz.so /usr/lib/aarch64-linux-gnu/libtiff.so /usr/lib/aarch64-linux-gnu/libjasper.so /usr/lib/aarch64-linux-gnu/libjpeg.so gstbase-1.0 gstreamer-1.0 gobject-2.0 glib-2.0 gstvideo-1.0 gstapp-1.0 gstriff-1.0 gstpbutils-1.0 avcodec-ffmpeg avformat-ffmpeg avutil-ffmpeg swscale-ffmpeg dl m pthread rt /usr/lib/aarch64-linux-gnu/libtbb.so cudart nppc nppi npps cublas cufft -L/usr/local/cuda-8.0/lib64
3rdparty dependencies:
OpenCV modules:
To be built: cudev core cudaarithm flann imgproc ml objdetect video cudabgsegm cudafilters cudaimgproc cudawarping dnn imgcodecs photo shape videoio cudacodec highgui ts features2d calib3d cudafeatures2d cudalegacy cudaobjdetect cudaoptflow cudastereo stitching superres videostab python2
Disabled: world
Disabled by dependency: -
Unavailable: java python3 viz
GUI:
QT: NO
GTK+ 2.x: YES (ver 2.24.30)
GThread : YES (ver 2.48.1)
GtkGlExt: NO
OpenGL support: NO
VTK support: NO
Media I/O:
ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.8)
JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver )
WEBP: build (ver encoder: 0x020e)
PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.2.54)
TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 - 4.0.6)
JPEG 2000: /usr/lib/aarch64-linux-gnu/libjasper.so (ver 1.900.1)
OpenEXR: NO
GDAL: NO
GDCM: NO
Video I/O:
DC1394 1.x: NO
DC1394 2.x: NO
FFMPEG: YES
avcodec: YES (ver 56.60.100)
avformat: YES (ver 56.40.101)
avutil: YES (ver 54.31.100)
swscale: YES (ver 3.1.101)
avresample: NO
GStreamer:
base: YES (ver 1.8.3)
video: YES (ver 1.8.3)
app: YES (ver 1.8.3)
riff: YES (ver 1.8.3)
pbutils: YES (ver 1.8.3)
OpenNI: NO
OpenNI PrimeSensor Modules: NO
OpenNI2: NO
PvAPI: NO
GigEVisionSDK: NO
Aravis SDK: NO
UniCap: NO
UniCap ucil: NO
V4L/V4L2: NO/YES
XIMEA: NO
Xine: NO
Intel Media SDK: NO
gPhoto2: NO
Parallel framework: TBB (ver 4.4 interface 9002)
Trace: YES ()
Other third-party libraries:
Use Intel IPP: NO
Use Intel IPP IW: NO
Use VA: NO
Use Intel VA-API/OpenCL: NO
Use Lapack: NO
Use Eigen: YES (ver 3.2.92)
Use Cuda: YES (ver 8.0)
Use OpenCL: NO
Use OpenVX: NO
Use custom HAL: YES (carotene (ver 0.0.1))
NVIDIA CUDA
Use CUFFT: YES
Use CUBLAS: YES
USE NVCUVID: NO
NVIDIA GPU arch: 62
NVIDIA PTX archs:
Use fast math: NO
Python 2:
Interpreter: /usr/bin/python2.7 (ver 2.7.12)
Libraries: /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.12)
numpy: /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
packages path: lib/python2.7/dist-packages
Python 3:
Interpreter: /usr/bin/python3 (ver 3.5.1)
Python (for build): /usr/bin/python2.7
Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Matlab: Matlab not found or implicitly disabled
Documentation:
Doxygen: NO
Tests and samples:
Tests: YES
Performance tests: YES
C/C++ Examples: NO
Install path: /usr
cvconfig.h is in: /home/nvidia/opencv/build
-----------------------------------------------------------------