any idea on seeking index i where a[i]=certain value?

touchtony · March 25, 2010, 6:19pm

Hi,

I need to search the smallest index i for certain element in an array a[array_size] where a[i] = value. The way I could think of is using

shared int s_index;
if(thread_id == 0) s_index = array_size;
atomicMin( &s_index, thread_id);

But the bottleneck is the atomic operation where all threads need to access s_index. Is there any better choice? Thanks!

YDD · March 25, 2010, 6:25pm

Parallel reduction. Look into the Thrust library.

You7878 · March 25, 2010, 6:35pm

I suppose it is typical parallel reduction task. it is included in SDK.

You7878 · March 25, 2010, 6:50pm

I suppose it is typical parallel reduction task. it is included in SDK.

You7878 · March 25, 2010, 6:52pm

I suppose it is typical parallel reduction task. it is included in SDK.

touchtony · March 25, 2010, 7:15pm

I get it. Thanks!

YDD · March 25, 2010, 7:57pm

Actually, thinking some more, this is probably more of a prefix-sum/scan case. But Thrust is definitely the way to go.

You7878 · March 25, 2010, 8:46pm

Is it possible to delete my own post? :)

SPWorley · March 25, 2010, 8:59pm

Actually with this problem, you can avoid the complexities of parallel reduction. The overhead is not using atomicMin, it’s just using it so many times.

So have each thread keep its own minimum, and then at the end, just once, do the atomic min. The overhead is negligible then and your code complexity will drop enormously.

int minVal=0x7FFFFFFF; // per thread minimum

for (int i=threadIdx.x; i<maxN; i+=blockDim.x) 

   minVal=min(minVal, a[i]);

atomicMin(&s_index, minVal);

With the tweak above, you’ll use only a few atomicMins (equal to the number of threads, so perhaps 256, which is negligible). With the atomicMin inside the loop, you’d use maxN atomicMins, which could be huge if your array is big.

You won’t see any performance difference between the above trivial code and the parallel reduction unless maxN is smaller than a few thousand.

touchtony · March 25, 2010, 9:15pm

Actually with this problem, you can avoid the complexities of parallel reduction. The overhead is not using atomicMin, it’s just using it so many times.

So have each thread keep its own minimum, and then at the end, just once, do the atomic min. The overhead is negligible then and your code complexity will drop enormously.
int minVal=0x7FFFFFFF; // per thread minimum

for (int i=threadIdx.x; i<maxN; i+=blockDim.x) 

   minVal=min(minVal, a[i]);

atomicMin(&s_index, minVal);
With the tweak above, you’ll use only a few atomicMins (equal to the number of threads, so perhaps 256, which is negligible). With the atomicMin inside the loop, you’d use maxN atomicMins, which could be huge if your array is big.

You won’t see any performance difference between the above trivial code and the parallel reduction unless maxN is smaller than a few thousand.

actually the maxN is only 256 in my case, but this function is used many times so I’d like to find an efficient way to implement this. So as maxN is small here, which one would be faster? reduction or your code?

SPWorley · March 25, 2010, 9:22pm

Then reduction is exactly what you want for efficiency.

touchtony · March 25, 2010, 9:34pm

Thanks for your help!

You7878 · March 25, 2010, 9:51pm

actually reduction also process several elements per thread.

Topic		Replies	Views
parallel way to find min CUDA Programming and Performance	21	7468	April 15, 2011
Atomic_min - return Index CUDA Programming and Performance	1	1464	October 21, 2015
Finding Minimum in array CUDA Programming and Performance	6	12961	October 14, 2011
Reduction to find minimum value (__shfl_down) using warp shuffle CUDA Programming and Performance	3	3637	August 3, 2016
Best way to get the min value from an array CUDA Programming and Performance	3	3779	March 4, 2008
Best way to find many minimums CUDA Programming and Performance	8	2563	January 3, 2018
Finding minimum among multiple threads CUDA Programming and Performance	13	5439	August 11, 2013
Broadcast for all threads CUDA Programming and Performance	11	9874	April 19, 2010
Finding the min and max index of values in a 1D array greater than particular threshold CUDA Programming and Performance	6	498	March 11, 2024
find the index of best value in a device array CUDA Programming and Performance	3	4408	February 14, 2009

any idea on seeking index i where a[i]=certain value?

Related topics