Multicore support in OpenCV for tegra

Hello everyone,

I’ve installed OpenCV4Tegra using NVidia’s binaries as advised here (option 1, “Prebuilt OpenCV4Tegra library for L4T”):

http://elinux.org/Jetson/Installing_OpenCV#Prebuilt_OpenCV_library_versus_Building_the_OpenCV_library_from_source

In this page, NVidia-specific optimizations for the Tegra platform are advertised:

http://elinux.org/Jetson/Computer_Vision_Performance

OpenCV4Tegra: A free library provided by NVIDIA containing optimizations for NVIDIA's Tegra CPUs (ARM NEON SIMD optimizations, multi-core CPU optimizations and some GLSL GPU optimizations)

But the performance of the non-gpu libraries wasn’t up to my expectations, so I’ve written this simple program to dump build and runtime settings:

#include <cstdio>
#include "opencv2/opencv.hpp"

int main()
{
        cv::setNumThreads(4);
        printf(
                "CPUs: %d, Threads: %d, Use optimizations: %s\n",
                cv::getNumberOfCPUs(),
                cv::getNumThreads(),
                cv::useOptimized() ? "yes" : "no"
        );
        printf("%s\n", cv::getBuildInformation().c_str());
        return 0;
}

With very disappointing results:

CPUs: 1, Threads: 1, Use optimizations: yes

General configuration for OpenCV 2.4.10.1 =====================================
  Version control:               2.4.10.1

...

  Other third-party libraries:
    Use IPP:                     NO
    Use Eigen:                   YES (ver 3.2.0)
    Use TBB:                     NO
    Use OpenMP:                  NO
    Use GCD                      NO
    Use Concurrency              NO
    Use C=:                      NO
    Use Cuda:                    YES (ver 6.5)
    Use OpenCL:                  NO

  NVIDIA CUDA
    Use CUFFT:                   YES
    Use CUBLAS:                  NO
    USE NVCUVID:                 NO
    NVIDIA GPU arch:             32
    NVIDIA PTX archs:
    Use fast math:               NO

...

  Install path:                  /usr

  cvconfig.h is in:              /hdd/buildbot/slave_jetson_tk1_2/52-O4T-L4T/build
-----------------------------------------------------------------

General configuration for OpenCV4Tegra =====================================
  inner version                  2.4.10.1
  memory allocator               NO
  hardware link                  YES
  compact sources                NO
  logging enabled                NO
-----------------------------------------------------------------

So, no multiprocessing support at all (OpenMP, TBB, etc.), also no indication of NEON optimizations… am I missing something?

The CPU count apparently varies when cores are enabled/disabled, but thread count can’t be forced to any value above 1, not even calling cv::setNumThreads() when all cores are active.

Thanks in advance,
Emilio.

Hi EmilioG,

We’re investigating this issue, the status will be updated once we clarified it.

Thanks

Hi EmilioG,

According to your experiment result, the OpenCV4Tegra version is 2.4.10.xx; we had been distributing 2.4.13 onwards since couple months ago on TK1 and TX1.

Could it be that you are running public OpenCV or very old version?
Which Jetpack did you install?

Thanks

Hello Kayccc,

sorry for the delay… I’ve been using L4T R21.4, which precedes latest version (R21.5) and is about one year old.

Can you please confirm that OpenCV version shipping with R21.5 has multicore and SIMD optimizations?

Hi EmilioG,

Would you please try running the last OpenCV4Tegra v2.4.13, then see if the result is the same?

Thanks

Our project is currently locked on L4T 21.4 / OpenCV4Tegra 2.4.10, which is what was available when we started. I’ll try to switch momentarily to 2.4.13 and see the results, but can’t promise to make it soon.