I am using the thrust library in CUDA 4.0. I can use the following command to get the sum to a host variable:
int h_sum = thrust::reduce(dev_ptr_out, dev_ptr_out + array_size, 0, thrust::plus<float>());
I think the result was probably in the device memory and then copied to the host variable, but how do I know where in the device memory the sum is stored? I want to use that sum result in the device memory directly. Otherwise, I will have to copy the h_sum to a constant memory location or a device memory location, which seems to be a waste of time, since the original result is from the device memory anyway.