Hello guys!
I was wondering how to find the global minimum in a kernel when each thread has it’s own value. Let us look at a rather meaningless example
float min = 0;
float MAD = 0;
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
if(idx < n && idy < m)
{
for(float g=0; g<N; g++)
{
float fval = tex2D (tex, idx+(g-2)+0.5f, idy+(g-2)+0.5f);
MAD += fval;
}
min=MAD;
pos[0]=idx;
pos[1]=idy;
}
And the kernel is launched
dim3 dimBlock( 16,16 );
dim3 dimGrid;
dimGrid.x = (n + dimBlock.x - 1)/dimBlock.x;
dimGrid.y = (m + dimBlock.y - 1)/dimBlock.y;
kernel <<< dimGrid,dimBlock >>> (...);
The way I see it, every thread will now have it’s own min value. In order to find the global minimum, the program needs to compare the min value in each thread to find the global minimum. The way I would like to achieve this, is to use shared memory within each block and then use atomic function to find the minimum among the blocks. I’ve seen this topic, how to find the min value - CUDA Programming and Performance - NVIDIA Developer Forums, but I need som more help. Can anyone show my have to save each value to shared memory, so I can use the code proposed in the link above?
Best regards
Sondre