Is it possible to increment a variable by different threads at the same time ?

Hello everyone !

My question is quite easy, I have a integer and I want that every threads increments it at the same time ! I want to do something like that :

// Kernel that executes on the CUDA device

__global__ void compute(int result, int H, int W)

{

  int idx=threadIdx.x + blockIdx.x* blockDim.x;

  int idy=threadIdx.y + blockIdx.y* blockDim.y;

if ( idx < W && idy < H ) result[0]+=1;

}

I want that result[0] becomes WH (beacause we increment it WH times) !

For sure I’ve done some test and I know that the main idea of parallel programing is that every thread executes itself in parallel so every thread does probably something like

So when they read read result they all read 0 and they all write back 1 in result !

So it seems (for me) normal to obtain 1 but nevertheless I was wondering if there is an other option I did not think about that can solves my problem !

Until now, I’m doing the calculation (which is a long one not only +1) on the device, each thread is writing its result in one case of an array and then on the CPU I go trough the array and I add every case of the array to another variable to obtain the final sum.

If I’m not clear enough, do not hesitate to ask for more details !

a reduction (best option) or AtomicInc will do

Thank you for your answer !

I’ve tried some stuff about the atomic functions but first of all I’m getting strange result when I’m doing

// Kernel that executes on the CUDA device

__global__ void compute(int *result, int H, int W)

{

  int test = 5;

  atomicAdd(&test,10);

int idx=threadIdx.x + blockIdx.x* blockDim.x;

  int idy=threadIdx.y + blockIdx.y* blockDim.y;

if ( idx < W && idy < H ) result[0]=test;

}

Is that not supposed to put result[0] to 15 ? Because it still 0…

But about the main functioning of atomic functions, I thought that what happened is that the other threads were “waiting” for the authorization to do their calculation and so the interest of the parallel programing was avoided…

I will try to find some stuff on the forum about reduction to see how it can feet with my configuration !

The SDK has samples on both reduction & atomic functions. (You should do AtomicInc(&global_memory_element, 10);)