Hi everyone!
EngineUKFTs.mk (29.1 KB)
I’m testing my algorithm by using OpenMp on Jetson TX2 CPUs. But when I run the Executable file(.elf), I found it creates multi-threads but only run on one cpu cores. The testing code is simple(sigmaMeasurement is a matrix based on Eigen):
#pragma omp parallel for
for(int i = 0;i<9;i++)
sigmaMeasurement.col(i) = func(sigmaX.col(i));
I added -fopenmp with Xcompile and -lgomp with Xlinker, it can compile and generate successful, but couldnot speed up. It seem like there’s an setting in my .mk file for this algorithm is not correct, because i wrote a simple demo with the same construct and compile with g++ -fopenmp… and get correct results run on 4 cores. Does anyone know how that happen?
Here’s my .mk setting, and i uploaded the file in begin.
# C Compiler: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C Compiler Driver
CC = nvcc
# Linker: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C Linker
LD = nvcc
# C++ Compiler: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C++ Compiler Driver
CPP = nvcc
# C++ Linker: NVCC for NVIDIA Embedded Processors1.0 NVIDIA CUDA C++ Linker
CPP_LD = nvcc
# Archiver: NVCC for NVIDIA Embedded Processors1.0 Archiver
AR = ar
# MEX Tool: MEX Tool
MEX_PATH = $(MATLAB_ARCH_BIN)
MEX = $(MEX_PATH)/mex
# Download: Download
DOWNLOAD =
# Execute: Execute
EXECUTE = $(PRODUCT)
# Builder: Make Tool
MAKE = make
ARFLAGS = -ruvs
CFLAGS = -rdc=true -Xcudafe "--diag_suppress=unsigned_compare_with_zero" \
-c \
-Xcompiler -MMD,-MP,-fopenmp \
-O2
CPPFLAGS = -rdc=true -Xcudafe "--diag_suppress=unsigned_compare_with_zero" \
-c \
-Xcompiler -fopenmp,-MMD,-MP \
-O2
CPP_LDFLAGS = -lm -lrt -ldl \
-Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
CPP_SHAREDLIB_LDFLAGS = -shared \
-lm -lrt -ldl \
-Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
DOWNLOAD_FLAGS =
EXECUTE_FLAGS =
LDFLAGS = -lm -lrt -ldl \
-Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
MEX_CPPFLAGS =
MEX_CPPLDFLAGS =
MEX_CFLAGS =
MEX_LDFLAGS =
MAKE_FLAGS = -f $(MAKEFILE)
SHAREDLIB_LDFLAGS = -shared \
-lm -lrt -ldl \
-Xlinker -lgomp,-rpath,/usr/lib32 -Xnvlink -w -lcudart -lcuda -Wno-deprecated-gpu-targets
Thanks in advance!