Building OpenCV from source

I’ve seen various mentions that building OpenCV from source will not provide the same performance as the version pre-built by Nvidia. A few questions:

  • Is this still true?
  • Is it possible to get a rough list of what features have crippled-performance when building from source?
  • Is there a source-patch available somewhere to with full performance?

For what it’s worth, I’ve always built OpenCV from source on TX2 & Xavier and it seems to work well, but of course I’m paranoid that I’m not getting the maximum possible performance…

Thanks.

Sorry, I’m not able to give a precise answer as this is a complex topic and Opencv4Tegra sources have not been published, AFAIK.

However, I’d advise to build from source now for various reasons:

  • Opencv4Tegra was mainly developped for TK1 and for TX1 when it was 32 bits versions, although some versions have been released with 64 bits L4Ts.
  • The GPU functions were linked to old CUDA versions.
  • Some optimizations in OpencvTegra were acheived through Carotene library, that NVIDIA published and has been included in opencv-3.1 IIRC.
  • Most of other optimizations were achieved through compiler options. Using opencv function getBuildInformation() with Opencv4Tgera you may have a look to these. These may be related to the compiler used at that time. Not sure, but I think this was gcc-4.9. Newer compilers may be able to automatically perform better optimizations. I had acheived some good speedup on TX2 using machine flags. At that time lto did not give improvement (nor llvm), but that might have been related to the old cmake version I was using, and it would probably be different now.
  • There are many new features available (and bugs fixed) in recent versions.
  • Note that Opencv4Tegra is no longer provided in JetPack in recent releases.

However, I never got a version which was absolutely better than another. Depending on the test case, it was common to see a version performing better for a part of code, while another one was performing better on another part. So I’d say you have to try and benchmark with your application. The hell starts here…there are so many options that the number of possibilities is huge, and building a new version of opencv is quite long (I was natively building), so I’m far from having tried 5% from these.

Having changed my codes to support opencv3, I’m no longer using opencv4tegra (nor any other opencv2 version). Now I’m also using opencv4 (so far, I think the only change required was adding opencv4 at the end of include path in Makefiles).

Sorry for not being able to give a more definite answer. Other users or NVIDIA folks may provide their opinion and advise further.

It was my understanding that OpenCV4Tegra has been deprecated as of Jetpack 3.2.1, being replaced by the general version of OpenCV 3.2. Nvidia contributed the optimizations that were in use by OpenCV4Tegra along with some fixes relating to CUDA Compatibility with OpenCV
(https://github.com/opencv/opencv/wiki/ChangeLog#version32)

So basically past OpenCV 3.2, Tegra optimizations (on the CPU end at least) come packaged and available in the general installation of OpenCV.

I’ve used JetsonHack’s build script which builds OpenCV with a number of compile flags that might(?) have performance benefits for the Jetsons. https://github.com/jetsonhacks/buildOpenCVTX2/blob/master/buildOpenCV.sh