for converting between raw/thrust pts in device, is it something like this(got some error when run)?
if I have a struct pt
thrust::device_ptr tempp(w->data_pointer_in_gpu);
thrust::device_ptr temp_res;
*temp_res=thrust::reduce(tempp,tempp+1024); //find sum and store into thrust device ptr?
Not sure about your specific use case, but for sorting using thrust (which is the fastest primitive sort I have ever tested by a huge margin even when memory copies both directions are included) this is how I implement;
I got a error when run but build is fine for my original post. w->data_pointer_in_gpu is struct with float pointer.
so I try to convert my 1st raw pts(which is where data store in GPU mem) to thrust device ptr. then convert back to raw ptr after “reduce” then assign to w->data_pointer2_in_gpu ptr
the error is
“terminate called after throwing an instance of ‘thrust::system::system_error’
what(): invalid argument”
Yeah, part of the problem with like half of the Thrust stuff is that when you’re drowning in the middle of super long type names, it’s easy to forget, “Oh yeah, this is an actual pointer that I need to malloc and manage.”
I’m thinking of backporting std::unique_ptr into CUDA. I’m hoping it’s not too hard. Thrust also lacks C++11 features like move semantics as well so you can’t std::move one Thrust vector into another. Which is lame.
I did some testing for reduction. at least for me the thrust seem much slower then my own reduction using cuda C. not sure this due to ptr conversion or else.