I have some code to find some items from a array and save it into another array. Is it possible to parallel it using CUDA?
[codebox]
vector a(N);
vector b(0);
for(int i=0;i<N;i++)
{
if(a[i]>=t)
b.add(a[i])
}
[/codebox]
Thanks
I have some code to find some items from a array and save it into another array. Is it possible to parallel it using CUDA?
[codebox]
vector a(N);
vector b(0);
for(int i=0;i<N;i++)
{
if(a[i]>=t)
b.add(a[i])
}
[/codebox]
Thanks
Oh why not. Sure it is a parallel problem. The problem will come because of synchronization among blocks. You need to find a way to solve that (sure possible)
And, it will matter only if your array is quite big in size. Otherwise, I dont think it would matter to do this in CUDA
Thanks for the reply.
The reason why I want to do this is because the array is already generated on GPU, It will take lots of time to copy back to CPU. So I tried to copy only those needed elements ( 1% or less).
What you described is known as a “compaction”. It’s a common and useful technique.
It’s implemented in CUDPP (http://www.gpgpu.org/developer/cudpp/) but it’s useful to understand how it works in general.
Also if you’re outputting just a small fraction of results, and order doesn’t matter, you could consider using atomic increments to stuff each passing element into a device array. That’s slow in general if you have many output values, but very easy.
Thanks. I know there must be some general algorithm for this, but I don’t know what the keyword is.