I am using the Nvidia HPC SDK 20.9 package right now. I’m currently trying to use thrust on device to accelerate sorting. I’m unable to get even the simplest code which uses the thrust device functionalities to compile. Using nvc++
on the following code
#include <thrust/device_vector.h>
int main() {
thrust::device_vector< double > v1(10);
return 0;
}
results in the error below on compilation. Has anyone else run into this problem as well or have any ideas on how to get thrust device code to compile properly? I truncated some of the error message, but there are about 15 instantiation errors that are detected.
"/home/khoidang/.local/nvhpc/Linux_x86_64/20.9/cuda/includ
e/thrust/system/detail/generic/for_each.h", line 66: error:
incomplete type is not allowed
THRUST_STATIC_ASSERT_MSG(
^
detected during:
.
.
.
instantiation of "thrust::device_vector<T,
Alloc>::device_vector(thrust::device_vector<T,
Alloc>::size_type) [with T=double,
Alloc=thrust::device_allocator<double>]" at line 4 of
"test.cpp"
1 error detected in the compilation of "test.cpp".
What’s your compile instruction?
I’m using nvc++ test.cpp -o test.exe
You can either do
nvcc test.cu -o test.exe
or
nvcc -x cu test.cpp -o test.exe
Ok thanks. This works to compile the simple test case. Now I’m trying to use thrust with openACC in a similar manner to this example (accelerator_interoperability/Hash at master · olcf/accelerator_interoperability · GitHub) where the thrust code is inside a wrapper function in a separate file and the call to the wrapper function occurs inside a #pragma acc parallel
region, except I am using a .cpp
instead of a .c
file.
I can use nvcc -c sortGPU.cu
and nvc++ -c sort.cpp
to obtain the object code successfully, but am unable to link the sortGPU.o
and sort.o
successfully. When using nvc++
to link, I get a series of undefined reference errors:
tmpxft_00003171_00000000-6_gpu.cudafe1.cpp:(.text._ZN3cub11EmptyKernelIvEEvv[_ZN3cub11EmptyKernelIvEEvv]+0x54): undefined reference to `__cudaPopCallConfiguration'
tmpxft_00003171_00000000-6_gpu.cudafe1.cpp:(.text._ZN3cub11EmptyKernelIvEEvv[_ZN3cub11EmptyKernelIvEEvv]+0x99): undefined reference to `cudaLaunchKernel'
.
.
.
/home/khoidang/gpu/main.cpp:50: undefined reference to `__pgi_uacc_dataenterstart2'
/home/khoidang/gpu/main.cpp:63: undefined reference to `__pgi_uacc_dataoffb2'
.
.
.
/home/khoidang/gpu/main.cpp:63: undefined reference to `__pgi_uacc_dataexitdone'
The cuda and pgi errors both occur when using nvc++
to link while the pgi errors only occur when using nvcc
. Do you know what the proper way to link this kind of example is?
wrapper function in a separate file
You might need to add -rdc
for relocatable code
Are you trying to call a thrust function inside a #pragma acc parallel
region?
I’m not sure that’s possible in every case, because you’re asking each thread launched on the GPU to run that thrust function.
Sorry I mistyped. I meant to write #pragma acc host_data use_device(x,y)
region. Setting -rdc=true
still leaves me with the same undefined reference errors.