Finding the min and max index of values in a 1D array greater than particular threshold

Hello everyone,
I am trying to write a CUDA kernel that is used to find min and max index of values in 1D array greater than a particular threshold
Below is the pseudo code in CPU for doing the same

int min_index = 0, max_index = length;
for(int i = 0; i < length; i++)
{
    if(arr[i] >= threshold)
    {
        min_index = i;
        break;
    }
}

for(int i = length - 1 ; i >= min_index; i--)
{
    if(arr[i] >= threshold)
    {
        max_index = i;
        break;
    }
}

One way of doing this is GPU is using atomic operations, but i would not prefer to use them. I wanted to know if there is any better, faster approach for solving this problem

Would greatly appreciate any ideas on this

Is the array sorted?

No the array is not sorted

You can use a parallel reduction to find the positions. There are many resources on this topic. The first that comes up in my search is https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf .

If you are familiar with Thrust, you could use something like thrust::transform_reduce.

I was thinking of using reduction, but unable to figure out the logic with which i need to do reduction. For min reduction a logic like below can be followed

for(int i = 8; i >= 1; i /= 2)
{
    if(threadIdx.x < i)
        sh_mem[threadIdx.x] = min(sh_mem[threadIdx.x], sh_mem[threadIdx.x + i]);
    __syncthreads();
}

But what logic can i use to do for my case, where i need to compare the values in shared memory with a threshold and compute the min index in shared memory satisfying this criteria?

I think a simple approach would be to have an index list where indices are set to infinity if the element does not reach the threshold. then do a min reduction of the indices.

int my_min_index = infinity
if(array[i] >= threshold){
  my_min_index = i
}
//now perform min reduction of my_min_index
1 Like