Compilation errors using openmp as thrust backend

Hi,

I have a program that compiles and runs cleanly (No compilation warnings or errors.) It uses thrust for all the CUDA work. Things run fast!!

I was asked to demonstrate the speed improvements of using the GPU. So my thought was to have it compile using openmp as the backend. Then I could run it with {1,2,4} cores and show how the GPU is significantly faster.

However, I get a ton of errors at compile time that are not present when compiling for the GPU.

I’m using cmake and the only change I made was to add a few compiler flags. My understanding, from the thrust documentation, is that this is the only thing necessary to move things from the GPU to the CPU (through openMP)

list(APPEND CUDA_NVCC_FLAGS -Xcompiler )

list(APPEND CUDA_NVCC_FLAGS -fopenmp )

list(APPEND CUDA_NVCC_FLAGS -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP )

list(APPEND CUDA_NVCC_FLAGS -lgomp)

The demo on the thrust backend page (monte_caro.cu) does compile and run cleanly using openMP, so that leads me to believe that my system actually has the correct libraries in the right place.

The errors I get all see to repeat some version of the following:

CMakeFiles/clr.dir/./clr_generated_better_clr.cu.o: In function `thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > thrust::detail::backend::omp::for_each_n<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_binary_transform_functor<thrust::detail::binary_negate<thrust::equal_to<int> > > >(thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<unsigned int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_binary_transform_functor<thrust::detail::binary_negate<thrust::equal_to<int> > >) [clone ._omp_fn.6]':

/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:62: undefined reference to `omp_get_num_threads'

/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:62: undefined reference to `omp_get_thread_num'

The other error I see is:

/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:67: undefined reference to `GOMP_parallel_start'

/usr/local/cuda/include/thrust/detail/backend/omp/for_each.inl:67: undefined reference to `GOMP_parallel_end'

Any advice on how to remedy this?

Thanks!

First off, adding a linker flag to NVCC won’t help much during link phase, since nvcc isn’t used for linking. NVCC is only used to build the object files, and the host compiler is used for linking.

find_library(GOMP_LIBRARY gomp)

target_link_libraries(mytarget ${GOMP_LIBRARY})

The other flags look OK, though you could collapse them into a single command if you wanted:

list(APPEND CUDA_NVCC_FLAGS -Xcompiler -fopenmp -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP )

If you are using makefiles, then you could type make VERBOSE=1 to see the command being generated and verify that the command line matches what you expect.

If you could post what the monte_caro.cu’s command line looks like, I can help you construct the correct CMake code to generate the correct command line.

Also, you should verify that building a copy of monte_carlo.cu within CMake works. That way you can be sure that you have the build setup correctly to eliminate the possibility that you have problems in your code.

Following an example I found on stackoverflow, this works

find_package(OpenMP)

list(APPEND CUDA_NVCC_FLAGS -Xcompiler )

list(APPEND CUDA_NVCC_FLAGS -fopenmp )

list(APPEND CUDA_NVCC_FLAGS -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP )

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")

Trying your suggestion gives me an error:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.

Please set them or make sure they are set and tested correctly in the CMake files:

GOMP_LIBRARY

Right, you get that error when you told CMake to find a library and use it, but it didn’t actually find the library. I mistakenly didn’t include the check:

find_library(GOMP_LIBRARY gomp)

if (NOT GOMP_LIBRARY)

  message(SEND_ERROR "gomp library not found")

endif()

target_link_libraries(mytarget ${GOMP_LIBRARY})

Well, I’m glad you got it working!

That makes sense.

Thank You!!