Inter-threads communication

I am having a problem with inter-thread communication.

I made only one block and grid.

void main()
dim3 block(32,1);
dim3 grid(1,1);

Test<<<grid, block>>>(x);


So I thought below code worked as I wanted.

globla void Test(int * x)
shared int temp = *x;




            *x = temp;


I expected *x gave me 32.

It only worked well in EmuDebug.

In Release, above code gave me 1.

I have no idea what the problem is.

Can you help me??


that increment is a giant race condition. __syncthreads() is not a critical section, it’s a barrier synchronization. have you done parallel programming before?

no…I thought __syncthreads() acts like critical section. But…it’s not…

Is there anything like critical section can be used in cuda??

Basically no. The closest thing is a set of atomic memory operations which can operate on global memory (and shared memory if you have a compute capability 1.3 card). So you could implement your kernel using an atomicAdd(), for example.