thrust pts cast convert

for converting between raw/thrust pts in device, is it something like this(got some error when run)?

if I have a struct pt

thrust::device_ptr tempp(w->data_pointer_in_gpu);
thrust::device_ptr temp_res;
*temp_res=thrust::reduce(tempp,tempp+1024); //find sum and store into thrust device ptr?

w->data_pointer2_in_gpu=thrust::raw_pointer_cast(temp_res);

all pointer are in device.

Not sure about your specific use case, but for sorting using thrust (which is the fastest primitive sort I have ever tested by a huge margin even when memory copies both directions are included) this is how I implement;

thrust::device_ptr<float> D_p=thrust::device_pointer_cast(D_Arr);

		thrust::sort(D_p,D_p+num_elem);

Where D_arr is a device pointer to a float array allocated via cudaMalloc().

Not sure if this helps, as I only use thrust for sorting, but the above does work correctly without issue.

I got a error when run but build is fine for my original post. w->data_pointer_in_gpu is struct with float pointer.

so I try to convert my 1st raw pts(which is where data store in GPU mem) to thrust device ptr. then convert back to raw ptr after “reduce” then assign to w->data_pointer2_in_gpu ptr

the error is

“terminate called after throwing an instance of ‘thrust::system::system_error’
what(): invalid argument”

thrust::device_ptr<float> temp_res;
*temp_res=thrust::reduce(tempp,tempp+1024); //find sum and store into thrust device ptr?

It looks like you haven’t allocated any space at temp_res in this example.

duh, thx, stupid mistake.

Yeah, part of the problem with like half of the Thrust stuff is that when you’re drowning in the middle of super long type names, it’s easy to forget, “Oh yeah, this is an actual pointer that I need to malloc and manage.”

I’m thinking of backporting std::unique_ptr into CUDA. I’m hoping it’s not too hard. Thrust also lacks C++11 features like move semantics as well so you can’t std::move one Thrust vector into another. Which is lame.

I did some testing for reduction. at least for me the thrust seem much slower then my own reduction using cuda C. not sure this due to ptr conversion or else.

You should post a small, self-contained example exemplifying the behavior you’re seeing.