Inter-threads communication

I am having a problem with inter-thread communication.

I made only one block and grid.

void main()
{
dim3 block(32,1);
dim3 grid(1,1);

Test<<<grid, block>>>(x);

}

So I thought below code worked as I wanted.

globla void Test(int * x)
{
shared int temp = *x;

            __syncthreads();

            temp++;

            __syncthreads();

            *x = temp;

}

I expected *x gave me 32.

It only worked well in EmuDebug.

In Release, above code gave me 1.

I have no idea what the problem is.

Can you help me??

Thanks.

that increment is a giant race condition. __syncthreads() is not a critical section, it’s a barrier synchronization. have you done parallel programming before?

no…I thought __syncthreads() acts like critical section. But…it’s not…

Is there anything like critical section can be used in cuda??

Basically no. The closest thing is a set of atomic memory operations which can operate on global memory (and shared memory if you have a compute capability 1.3 card). So you could implement your kernel using an atomicAdd(), for example.