Hello everyone,
I am trying to write a CUDA kernel that is used to find min and max index of values in 1D array greater than a particular threshold
Below is the pseudo code in CPU for doing the same
int min_index = 0, max_index = length;
for(int i = 0; i < length; i++)
{
if(arr[i] >= threshold)
{
min_index = i;
break;
}
}
for(int i = length - 1 ; i >= min_index; i--)
{
if(arr[i] >= threshold)
{
max_index = i;
break;
}
}
One way of doing this is GPU is using atomic operations, but i would not prefer to use them. I wanted to know if there is any better, faster approach for solving this problem
Would greatly appreciate any ideas on this
No the array is not sorted
You can use a parallel reduction to find the positions. There are many resources on this topic. The first that comes up in my search is https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf .
If you are familiar with Thrust, you could use something like thrust::transform_reduce.
I was thinking of using reduction, but unable to figure out the logic with which i need to do reduction. For min reduction a logic like below can be followed
for(int i = 8; i >= 1; i /= 2)
{
if(threadIdx.x < i)
sh_mem[threadIdx.x] = min(sh_mem[threadIdx.x], sh_mem[threadIdx.x + i]);
__syncthreads();
}
But what logic can i use to do for my case, where i need to compare the values in shared memory with a threshold and compute the min index in shared memory satisfying this criteria?
I think a simple approach would be to have an index list where indices are set to infinity if the element does not reach the threshold. then do a min reduction of the indices.
int my_min_index = infinity
if(array[i] >= threshold){
my_min_index = i
}
//now perform min reduction of my_min_index
1 Like