Incrementing a "counter"

Juippi · January 27, 2013, 1:00am

I need my kernel to return an array that’s “lean” (no empty entries between indices), but going about that in a parallel manner is a bit more complicated than I thought.

EDIT: Ugh, I just pasted some code but apparently there’s a low limit for chars per post

Juippi · January 27, 2013, 1:01am

global void getNearbyPoints(Point pt, Point* pts, double dist, Point* ret)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (hypot((double)pt.x - pts[idx].x, (double)pt.y - pts[idx].y) <= dist)
{
ret[count] = pts[idx];
atomicAdd(&count, 1);
}
}

(count is a device variable) The addition seems to finish only after all the threads are done executing.

Lev · January 27, 2013, 1:45am

It is not a critical section, use index=atomicAdd(&count,1); ret[index]=pts[idx]; another way is to use compact function from cudpp or maybe trust.

alrikai · January 27, 2013, 8:28am

It sounds like you’re looking for stream compaction. As Lev stated, you can find functionality for this built into cudpp and thrust (thrust is included automatically in CUDA 5). You should consider taking a look at: Google Code Archive - Long-term storage for Google Code Project Hosting.

You would need a different functor for removing elements than used in the example of course, but it should convey the idea pretty well.

Juippi · January 27, 2013, 9:49pm

Thanks, right now I’m using the fix Lev posted, although I don’t wholly understand why it works since atomicAdd is supposed to return the old value according to the documentation. I’ll see how much atomic operations end up slowing the process down, that stream compaction example seems to do exactly what I wanted…

pasoleatis · January 28, 2013, 9:52am

Hello,

If you use index=atomicAdd(&count,1). The index will be equal with the value of the count before incrementing, while the value of the count goes up by 1. This works because if you have an array of size countmax then index will be between 0 and countmax-1.