parallel find find multiple items from a array

I have some code to find some items from a array and save it into another array. Is it possible to parallel it using CUDA?


vector a(N);

vector b(0);

for(int i=0;i<N;i++)







Oh why not. Sure it is a parallel problem. The problem will come because of synchronization among blocks. You need to find a way to solve that (sure possible)

And, it will matter only if your array is quite big in size. Otherwise, I dont think it would matter to do this in CUDA

Thanks for the reply.

The reason why I want to do this is because the array is already generated on GPU, It will take lots of time to copy back to CPU. So I tried to copy only those needed elements ( 1% or less).

What you described is known as a “compaction”. It’s a common and useful technique.
It’s implemented in CUDPP ( but it’s useful to understand how it works in general.

Also if you’re outputting just a small fraction of results, and order doesn’t matter, you could consider using atomic increments to stuff each passing element into a device array. That’s slow in general if you have many output values, but very easy.

Thanks. I know there must be some general algorithm for this, but I don’t know what the keyword is.