Increment Global __device__ Issue

I am trying to accumulate results from every thread in a device variable. When I run the program, it will only have results from the last thread to touch the variable.

For example:

__device__ float result=0;

__global__ void gpuCode()




void host()


 float result_h;



 printf("result: %f",result_h);


You can’t do that. To perform that action you need to use reduction or atomic instructions for integer numbers.

That works like a CPU… you’ll need a critical section/InterlockedIncrement to avoid race conditions and to sync writes… Like CUDA does not have critical sections, use the AtomicAdd for compute 1.1 or above hardware or make a reduction kernel.

And atomic operations won’t work with float operations.
Look at the SDK example project “Reduction”. It’s exactly what you want.