I have a heavily contended float that I want to ensure doesn’t screw up.
long story short, I have a 3D block executing where the z dimension corresponds to the specific row, the x dimension corresponds to row elements, and lastly the y direction shares each element. I want to keep a running max variable (float) in each element but I’m concerned about race conditions.
any smart way to implement this. i’m compute capability 1.1 on quadro fx 1700
uint ri = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
uint di = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;
uint z = __umul24(blockIdx.z, blockDim.z) + threadIdx.z + z_off;
uint dim = __umul24(gridDim.y, blockDim.y);
/*............*/
RAY_2D *row = (RAY_2D *)((char*)d_output.ptr + z * d_output.pitch);
//make sure these are set, but only once per ray
if(row[ri].r < 0.0f)
{
row[ri].x = x;
row[ri].y = y;
row[ri].r = r;
//row[ri].d = d;
}
//what to do here, d depends on di (y thread)
row[ri].max_d = help_me_atomic_max(row[ri].max_d, d);