I have some code to find some items from a array and save it into another array. Is it possible to parallel it using CUDA?

[codebox]

vector a(N);

vector b(0);

for(int i=0;i<N;i++)

{

if(a[i]>=t)

```
b.add(a[i])
```

}

[/codebox]

Thanks

I have some code to find some items from a array and save it into another array. Is it possible to parallel it using CUDA?

[codebox]

vector a(N);

vector b(0);

for(int i=0;i<N;i++)

{

if(a[i]>=t)

```
b.add(a[i])
```

}

[/codebox]

Thanks

Oh why not. Sure it is a parallel problem. The problem will come because of synchronization among blocks. You need to find a way to solve that (sure possible)

And, it will matter only if your array is quite big in size. Otherwise, I dont think it would matter to do this in CUDA

Thanks for the reply.

The reason why I want to do this is because the array is already generated on GPU, It will take lots of time to copy back to CPU. So I tried to copy only those needed elements ( 1% or less).

What you described is known as a “compaction”. It’s a common and useful technique.

It’s implemented in CUDPP (http://www.gpgpu.org/developer/cudpp/) but it’s useful to understand how it works in general.

Also if you’re outputting just a small fraction of results, and order doesn’t matter, you could consider using atomic increments to stuff each passing element into a device array. That’s slow in general if you have many output values, but very easy.

Thanks. I know there must be some general algorithm for this, but I don’t know what the keyword is.