The following Windows app does nothing - keys remains unchanged. Stepping through it in the (Visual Studio 2013) debugger seems to skip the sort call. What am I doing wrong?
Michael
Michael
int main()
{
float values[5] = { 3, 5, 2, 4, 1 };
float *dV;
cudaMalloc(&dV, 5 * sizeof(float));
cudaMemcpy(dV, values, 5 * sizeof(float), cudaMemcpyHostToDevice);
int keys[5] = { 0, 1, 2, 3, 4 };
float *dK;
cudaMalloc(&dK, 5 * sizeof(int));
cudaMemcpy(dK, keys, 5 * sizeof(int), cudaMemcpyHostToDevice);
thrust::sort_by_key(keys, keys + 4, values);
cudaMemcpy(keys, dK, 5 * sizeof(int), cudaMemcpyDeviceToHost);
return 0;
}
read the thrust quick start guide on github - note what it says about thrust when you pass bare pointers to the algorithms.
Some options:
- Use thrust containers for the data, and use thrust iterators to reference the data, instead of bare pointers
- Set up thrust device pointers to point to your raw data, and use those in the algorithm
- Use thrust::device execution policy, to force select the algorithms device path
To follow up what txbob said, I’d recommend running your code through cuda-memcheck.
When you use raw pointers, you lost a lot of type information.
The Thrust algorithms are intended to use a form of tag dispatch to select between CPU and GPU versions of their algorithms.
In this case, it’s likely defaulting to the host.
I also recommend looking into thrust::device_vector and thrust::host_vector. Then you won’t need to worry about cumbersome cudaMemcpy’s and deallocating your memory.