Hi,
I have a program that compiles and runs cleanly (No compilation warnings or errors.) It uses thrust for all the CUDA work. Things run fast!!
I was asked to demonstrate the speed improvements of using the GPU. So my thought was to have it compile using openmp as the backend. Then I could run it with {1,2,4} cores and show how the GPU is significantly faster.
However, I get a ton of errors at compile time that are not present when compiling for the GPU.
I’m using cmake and the only change I made was to add a few compiler flags. My understanding, from the thrust documentation, is that this is the only thing necessary to move things from the GPU to the CPU (through openMP)
list(APPEND CUDA_NVCC_FLAGS -Xcompiler )
list(APPEND CUDA_NVCC_FLAGS -fopenmp )
list(APPEND CUDA_NVCC_FLAGS -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP )
list(APPEND CUDA_NVCC_FLAGS -lgomp)
The demo on the thrust backend page (monte_caro.cu) does compile and run cleanly using openMP, so that leads me to believe that my system actually has the correct libraries in the right place.
The errors I get all see to repeat some version of the following:
CMakeFiles/clr.dir/./clr_generated_better_clr.cu.o: In function `thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > thrust::detail::backend::omp::for_each_n<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_binary_transform_functor<thrust::detail::binary_negate<thrust::equal_to<int> > > >(thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_binary_transform_functor<thrust::detail::binary_negate<thrust::equal_to<int> > >) [clone ._omp_fn.6]':
/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:62: undefined reference to `omp_get_num_threads'
/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:62: undefined reference to `omp_get_thread_num'
The other error I see is:
/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:67: undefined reference to `GOMP_parallel_start'
/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:67: undefined reference to `GOMP_parallel_end'
Any advice on how to remedy this?
Thanks!