how to get the reduction result to a device variable directly in the thrust call?

I am using the thrust library in CUDA 4.0. I can use the following command to get the sum to a host variable:

int h_sum = thrust::reduce(dev_ptr_out, dev_ptr_out + array_size, 0, thrust::plus<float>());

I think the result was probably in the device memory and then copied to the host variable, but how do I know where in the device memory the sum is stored? I want to use that sum result in the device memory directly. Otherwise, I will have to copy the h_sum to a constant memory location or a device memory location, which seems to be a waste of time, since the original result is from the device memory anyway.

Thank you,

You may pass the reduction result as a parameter to your next kernel…