Then thrust will dispatch the algorithm to the host backend. i.e. it will run that transform
operation on the CPU. Thrust doesn’t under any circumstances do what you have pictured. That has to be constructed yourself out of thrust primitives. Although that link has useful components, in modern thrust its no longer necessary to use the experimental pinned allocator.
You cannot wrap a thrust::device_vector
around a pre-existing device allocation (at least not at the level of this discussion here. If you want to customize the thrust::device_vector
class yourself, that is a different discussion. In practice, I never assume people are asking that sort of question unless stated explicitly). However, you can take a “raw” pointer, like one returned from cudaMalloc
and wrap a thrust::device_ptr around it. This will likely give you what you need - the ability to use that in thrust algorithms. There are numerous questions on various forums demonstrating use of thrust::device_ptr
, here is one.