I try to copy data from host to device and back, but not with the CUDA API but the thrust library. I allocated memory in a thrust::host_vector
, and try to copy it to a thrust::device_vector
. However, when using thrust::copy
with the thrust::host
execution policy for any data transfer from host <-> device, the program crashes with a segmentation fault. Cuda-memcheck provides the following error message:
Error: process didn't terminate successfully
The application may have hit an error when dereferencing Unified Memory from the host.
The documentation on what the thrust::host and thrust::device execution policies actually do and what constraints are to be taken into account when using them is pretty scarce.
What are potential causes for thrust::copy not to work with the thrust::host execution policy? Note that not specifying the parameter explicitly works fine. The machine that I am working on is a POWER9 machine.
Here is a small reproducible example:
Build with nvcc -O3 -std=c++11 -Xcompiler -fopenmp test.cu -o test
#include <vector>
#include <omp.h>
#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#define NUM_GPUS 4
int main(int argc, char *argv[]) {
size_t num_elements = 10000;
size_t block_size = num_elements/4;
thrust::host_vector<int> hvec(num_elements);
std::vector<thrust::device_vector<int>*> dvecs(NUM_GPUS);
#pragma omp parallel for
for (size_t i = 0; i < NUM_GPUS; ++i)
{
cudaSetDevice(i);
dvecs[i] = new thrust::device_vector<int>(block_size);
thrust::copy( thrust::host,
hvec.begin() + (block_size * i),
hvec.begin() + (block_size * (i + 1)),
dvecs[i]->begin());
}
return 0;
}