Hi Everyone
I am trying to implement a barrier across blocks. The code is given below. The problem I am having is that I want to remove the atomicCAS(&dummy, 2, 2) lines but as soon as I do that the compiler gets rid of the while loop and my code stops working. I tried adding some other statements with volatiles in between but it does not work. Any ideas?
I also tried adding references to shared memory in between but the compiler is compiling things out. I really wanna get rid of the atomicCAS cause its a performance bottleneck.
regards
Manish
//global barrier start
///////////////////////////////////////////////////////////
__syncthreads();
//the ist thread of each block writes
//NUMBLOCKS in Array_in[bid]
if (threadIdx.x == 0)
Array_in[blockIdx.x] = (iteration[threadIdx.x] + 1) * NUMBLOCKS;
//block 0 waits for everyone to write
if (blockIdx.x == 0)
{
if (threadIdx.x < NUMBLOCKS)
{
while (Array_in[threadIdx.x] != (iteration[threadIdx.x] + 1) * NUMBLOCKS);
{
atomicCAS(&dummy, 2, 2);
}
}
__syncthreads();
//and then writes to Array_out
if (threadIdx.x < NUMBLOCKS)
Array_out[threadIdx.x] = (iteration[threadIdx.x] + 1) * NUMBLOCKS;
}
//all the blocks wait on Array_out
if (threadIdx.x == 0)
{
while (Array_out[blockIdx.x] != (iteration[threadIdx.x] + 1) * NUMBLOCKS)
{
atomicCAS(&dummy, 2, 2);
}
}
__syncthreads();
///////////////////////////////////////////////////////////
//global barrier end