I am trying to use OpenMP for Host and CUDA for Device - as per mentioned here and here. Here is a simple test:
[qth20@darwin-fe1 test-cuda]$ cat test4.cu
#include "thrust/device_vector.h"
#include "thrust/device_ptr.h"
#include "thrust/transform.h"
#include "thrust/host_vector.h"
struct Point {
double x;
double y;
};
int main(int argc, char *argv[]) {
thrust::host_vector<Point> h_p(100);
thrust::device_vector<Point> d_p(100);
return 1;
}
Compiling with the following flags:
[qth20@darwin-fe1 test-cuda]$ nvcc -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_OMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -Xcompiler -fopenmp test4.cu -o test4
/projects/opt/centos7/cuda/9.0/bin/..//include/thrust/system/omp/detail/sort.inl(120): error: name followed by "::" must be a class or namespace name
/projects/opt/centos7/cuda/9.0/bin/..//include/thrust/system/omp/detail/sort.inl(126): error: identifier "decomp" is undefined
/projects/opt/centos7/cuda/9.0/bin/..//include/thrust/system/omp/detail/sort.inl(198): error: name followed by "::" must be a class or namespace name
/projects/opt/centos7/cuda/9.0/bin/..//include/thrust/system/omp/detail/sort.inl(204): error: identifier "decomp" is undefined
4 errors detected in the compilation of "/tmp/tmpxft_0000c38b_00000000-4_test4.cpp4.ii".
With further toggling trial-and-error, it still doesn’t work. Any suggestions?
[qth20@darwin-fe1 test-cuda]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176