TX2 OpenMP threads only run on One CPU Cores

Hi everyone!
EngineUKFTs.mk (29.1 KB)
I’m testing my algorithm by using OpenMp on Jetson TX2 CPUs. But when I run the Executable file(.elf), I found it creates multi-threads but only run on one cpu cores. The testing code is simple(sigmaMeasurement is a matrix based on Eigen):

#pragma omp parallel for
for(int i = 0;i<9;i++)
    sigmaMeasurement.col(i) = func(sigmaX.col(i));

I added -fopenmp with Xcompile and -lgomp with Xlinker, it can compile and generate successful, but couldnot speed up. It seem like there’s an setting in my .mk file for this algorithm is not correct, because i wrote a simple demo with the same construct and compile with g++ -fopenmp… and get correct results run on 4 cores. Does anyone know how that happen?
Here’s my .mk setting, and i uploaded the file in begin.

# C Compiler: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C Compiler Driver
CC = nvcc
# Linker: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C Linker
LD = nvcc
# C++ Compiler: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C++ Compiler Driver
CPP = nvcc
# C++ Linker: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C++ Linker
CPP_LD = nvcc
# Archiver: NVCC for NVIDIA Embedded Processors1.0 Archiver
AR = ar
# MEX Tool: MEX Tool
MEX_PATH = $(MATLAB_ARCH_BIN)
MEX = $(MEX_PATH)/mex
# Download: Download
DOWNLOAD =
# Execute: Execute
EXECUTE = $(PRODUCT)
# Builder: Make Tool
MAKE = make
ARFLAGS              = -ruvs
CFLAGS               = -rdc=true -Xcudafe "--diag_suppress=unsigned_compare_with_zero" \
                       -c \
                       -Xcompiler -MMD,-MP,-fopenmp \
                       -O2 
CPPFLAGS             = -rdc=true -Xcudafe "--diag_suppress=unsigned_compare_with_zero" \
                       -c \
                       -Xcompiler -fopenmp,-MMD,-MP \
                       -O2 
CPP_LDFLAGS          = -lm -lrt -ldl \
                       -Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
CPP_SHAREDLIB_LDFLAGS  = -shared  \
                         -lm -lrt -ldl \
                         -Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
DOWNLOAD_FLAGS       =
EXECUTE_FLAGS        =
LDFLAGS              = -lm -lrt -ldl \
                       -Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
MEX_CPPFLAGS         =
MEX_CPPLDFLAGS       =
MEX_CFLAGS           =
MEX_LDFLAGS          =
MAKE_FLAGS           = -f $(MAKEFILE)
SHAREDLIB_LDFLAGS    = -shared  \
                       -lm -lrt -ldl \
                       -Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets

Thanks in advance!

Hi,

Could you share a complete source and reproduce steps so we can check it directly?
Thanks.

Thanks!I‘d like to share my complete source code, but there’s Confidential files in some .cpp as Engine parameters, and i can’t do that.
Here’s the detail using in code.

class UKF {
.....
public:
        template<class Measurement, template<class> class CovarianceBase>
        void computeSigmaPointMeasurementsIter(MeasurementModelType<Measurement, CovarianceBase>& m, SigmaPoints<Measurement>& sigmaMeasurementPoints)
        {
            omp_set_num_threads(3);     
            #pragma omp parallel
            {
                #pragma omp for
                for (int i = 0; i < SigmaPointCount; ++i)
                {
                 sigmaMeasurementPoints.col(i) = m.h(sigmaStatePointsIter.col(i));
                    printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
                    
                }
            }    
            std::cout << "MultiProcessor sigmaMeasurementPoints = " << sigmaMeasurementPoints << std::endl;
        }
}

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

We want to reproduce this issue internally first.
Would you mind wrapping the above function to a compilable source?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.