Nvcc opencv error on TX1 board

I install Jetpack3.1 on TX1 board following the install guide:
http://docs.nvidia.com/jetpack-l4t/#developertools/mobile/jetpack/l4t/3.2rc/jetpack_l4t_install.htm%3FTocPath%3D_____3
The install is successful with no error.
Then I write code and save it as test.cu, the code is as follow:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <opencv/cv.h>

int main()
{
cv::Mat frame;
exit(0);
}

and compile it using command: nvcc -o test test.cu -v
Then nvcc report error:
nvidia@tegra-ubuntu:/mnt/sda/test$ nvcc -o test test.cu -v
nvcc warning : The ‘compute_20’, ‘sm_20’, and ‘sm_21’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#$ SPACE=
#$ CUDART=cudart
#$ HERE=/usr/local/cuda-8.0/bin
#$ THERE=/usr/local/cuda-8.0/bin
#$ TARGET_SIZE=
#$ TARGET_DIR=
#$ TARGET_DIR=targets/aarch64-linux
#$ TOP=/usr/local/cuda-8.0/bin/…
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-8.0/bin/…/nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda-8.0/bin/…/lib:/usr/local/cuda-8.0/lib64:
#$ PATH=/usr/local/cuda-8.0/bin/…/open64/bin:/usr/local/cuda-8.0/bin/…/nvvm/bin:/usr/local/cuda-8.0/bin:/usr/local/cuda-8.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
#$ INCLUDES=“-I/usr/local/cuda-8.0/bin/…/targets/aarch64-linux/include”
#$ LIBRARIES= “-L/usr/local/cuda-8.0/bin/…/targets/aarch64-linux/lib/stubs” “-L/usr/local/cuda-8.0/bin/…/targets/aarch64-linux/lib”
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=200 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ “-I/usr/local/cuda-8.0/bin/…/targets/aarch64-linux/include” -D"CUDACC_VER=80072" -D"CUDACC_VER_BUILD=72" -D"CUDACC_VER_MINOR=0" -D"CUDACC_VER_MAJOR=8" -include “cuda_runtime.h” “test.cu” > “/tmp/tmpxft_00000958_00000000-9_test.cpp1.ii”
#$ cudafe --allow_managed --m64 --gnu_version=50400 -tused --no_remove_unneeded_entities --gen_c_file_name “/tmp/tmpxft_00000958_00000000-4_test.cudafe1.c” --stub_file_name “/tmp/tmpxft_00000958_00000000-4_test.cudafe1.stub.c” --gen_device_file_name “/tmp/tmpxft_00000958_00000000-4_test.cudafe1.gpu” --nv_arch “compute_20” --gen_module_id_file --module_id_file_name “/tmp/tmpxft_00000958_00000000-3_test.module_id” --unsigned_chars --include_file_name “tmpxft_00000958_00000000-2_test.fatbin.c” “/tmp/tmpxft_00000958_00000000-9_test.cpp1.ii”
/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(38): error: identifier “__Int8x8_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(39): error: identifier “__Int16x4_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(40): error: identifier “__Int32x2_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(41): error: identifier “__Int64x1_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(42): error: identifier “__Float16x4_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(43): error: identifier “__Float32x2_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(44): error: identifier “__Poly8x8_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(45): error: identifier “__Poly16x4_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(46): error: identifier “__Uint8x8_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(47): error: identifier “__Uint16x4_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(48): error: identifier “__Uint32x2_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(49): error: identifier “__Float64x1_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(50): error: identifier “__Uint64x1_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(51): error: identifier “__Int8x16_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(52): error: identifier “__Int16x8_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(53): error: identifier “__Int32x4_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(54): error: identifier “__Int64x2_t” is undefined

/usr/lib/gcc/aarch64-linux-gnu/5/include/arm_neon.h(55): error: identifier “__Float16x8_t” is undefined

so what’s the problem?

The problem maybe due to turn on the NEON option of opencv.
The opencv in TX1 board is the opencv4tegra , installed with JetPack-3.1. The version is 2.4.13.
What should I do to fix the problem?
Rebuild the opencv with turn off NEON option? But we need NEON accelerate.
Our project is written with opencv and cuda to realize real time tracking.
So we need NEON accelerate and gpu accelerate.

Hi altanres,

Please move to openCV3.x and check if NEON acceleration can be off. Opencv2.4.13 is depreciated.

How about separate cuda kernel and opencv main code?

Hi WayneWWW,
Thanks for reply.
If I build the OpenCV3.1 library from source, it may lost NVIDIA’s CPU & multi-core optimizations of OpenCV4Tegra.
I think OpenCV4Tergra has some optimizations. I get the information from the following link:
https://elinux.org/Jetson/Computer_Vision_Performance

OpenCV4Tegra: A free library provided by NVIDIA containing optimizations for NVIDIA's Tegra CPUs (ARM NEON SIMD optimizations, multi-core CPU optimizations and some GLSL GPU optimizations). OpenCV4Tegra is a closed-source binary replacement for the public OpenCV, thus the programmer just writes regular OpenCV code, that will automatically take advantage of OpenCV4Tegra optimizations without the developer or user necessarily knowing about it

So in order to get real time tracking in our system, the Opencv4tegra maybe more suitable.
And the opencv4tegra in lastest JetPack is 2.4.13.

Hi altanres,

If you put it into another cpp file, you can build it through g++ without hitting this error.

Why do you need to use opencv related library in .cu file? Is there any openCV API that directly operates in cuda kernel?