any idea on seeking index i where a[i]=certain value?

Hi,

I need to search the smallest index i for certain element in an array a[array_size] where a[i] = value. The way I could think of is using

shared int s_index;
if(thread_id == 0) s_index = array_size;
atomicMin( &s_index, thread_id);

But the bottleneck is the atomic operation where all threads need to access s_index. Is there any better choice? Thanks!

Parallel reduction. Look into the Thrust library.

I suppose it is typical parallel reduction task. it is included in SDK.

I suppose it is typical parallel reduction task. it is included in SDK.

I suppose it is typical parallel reduction task. it is included in SDK.

I get it. Thanks!

Actually, thinking some more, this is probably more of a prefix-sum/scan case. But Thrust is definitely the way to go.

Is it possible to delete my own post? :)

Actually with this problem, you can avoid the complexities of parallel reduction. The overhead is not using atomicMin, it’s just using it so many times.

So have each thread keep its own minimum, and then at the end, just once, do the atomic min. The overhead is negligible then and your code complexity will drop enormously.

int minVal=0x7FFFFFFF; // per thread minimum

for (int i=threadIdx.x; i<maxN; i+=blockDim.x) 

   minVal=min(minVal, a[i]);

atomicMin(&s_index, minVal);

With the tweak above, you’ll use only a few atomicMins (equal to the number of threads, so perhaps 256, which is negligible). With the atomicMin inside the loop, you’d use maxN atomicMins, which could be huge if your array is big.

You won’t see any performance difference between the above trivial code and the parallel reduction unless maxN is smaller than a few thousand.

actually the maxN is only 256 in my case, but this function is used many times so I’d like to find an efficient way to implement this. So as maxN is small here, which one would be faster? reduction or your code?

Then reduction is exactly what you want for efficiency.

Thanks for your help!

actually reduction also process several elements per thread.