I have the following code :
thrust::device_vector<int> unique_idxs(N); thrust::device_vector<int> sizes(N); thrust::pair<thrust::device_vector<int>::iterator, thrust::device_vector<int>::iterator> new_end = reduce_by_key(idxs.begin(), idxs.end(),thrust::make_constant_iterator(1),unique_idxs.begin(),sizes.begin()); int unique_elems=new_end.first-unique_idxs.begin(); sizes.erase(new_end.second, sizes.end());
where idxs is a sorted device vector of indices, unique_idxs are the unique indices and sizes are the frequencies of each index.
Timing my program I found out that this operation takes a long time compared to other operations that handle the same or more amount of data e.g. the sorting of the initial array to find idxs. Is there any way to speed it up?
This part also causes NVIDIA Kernel Mode Crash when the size of idxs becomes more than 500k elements.