Is it possible to increment a variable by different threads at the same time ?

Thibaud · November 10, 2009, 1:20pm

Hello everyone !

My question is quite easy, I have a integer and I want that every threads increments it at the same time ! I want to do something like that :

// Kernel that executes on the CUDA device

__global__ void compute(int result, int H, int W)

{

  int idx=threadIdx.x + blockIdx.x* blockDim.x;

  int idy=threadIdx.y + blockIdx.y* blockDim.y;

if ( idx < W && idy < H ) result[0]+=1;

}

I want that result[0] becomes WH (beacause we increment it WH times) !

For sure I’ve done some test and I know that the main idea of parallel programing is that every thread executes itself in parallel so every thread does probably something like

So when they read read result they all read 0 and they all write back 1 in result !

So it seems (for me) normal to obtain 1 but nevertheless I was wondering if there is an other option I did not think about that can solves my problem !

Until now, I’m doing the calculation (which is a long one not only +1) on the device, each thread is writing its result in one case of an array and then on the CPU I go trough the array and I add every case of the array to another variable to obtain the final sum.

If I’m not clear enough, do not hesitate to ask for more details !

E.D_Riedijk · November 10, 2009, 1:46pm

Hello everyone !

My question is quite easy, I have a integer and I want that every threads increments it at the same time ! I want to do something like that :
// Kernel that executes on the CUDA device

__global__ void compute(int result, int H, int W)

{

  int idx=threadIdx.x + blockIdx.x* blockDim.x;

  int idy=threadIdx.y + blockIdx.y* blockDim.y;

if ( idx < W && idy < H ) result[0]+=1;

}
I want that result[0] becomes WH (beacause we increment it WH times) !

For sure I’ve done some test and I know that the main idea of parallel programing is that every thread executes itself in parallel so every thread does probably something like

So when they read read result they all read 0 and they all write back 1 in result !

So it seems (for me) normal to obtain 1 but nevertheless I was wondering if there is an other option I did not think about that can solves my problem !

Until now, I’m doing the calculation (which is a long one not only +1) on the device, each thread is writing its result in one case of an array and then on the CPU I go trough the array and I add every case of the array to another variable to obtain the final sum.

If I’m not clear enough, do not hesitate to ask for more details !

a reduction (best option) or AtomicInc will do

Thibaud · November 10, 2009, 3:20pm

Thank you for your answer !

I’ve tried some stuff about the atomic functions but first of all I’m getting strange result when I’m doing

// Kernel that executes on the CUDA device

__global__ void compute(int *result, int H, int W)

{

  int test = 5;

  atomicAdd(&test,10);

int idx=threadIdx.x + blockIdx.x* blockDim.x;

  int idy=threadIdx.y + blockIdx.y* blockDim.y;

if ( idx < W && idy < H ) result[0]=test;

}

Is that not supposed to put result[0] to 15 ? Because it still 0…

But about the main functioning of atomic functions, I thought that what happened is that the other threads were “waiting” for the authorization to do their calculation and so the interest of the parallel programing was avoided…

I will try to find some stuff on the forum about reduction to see how it can feet with my configuration !

E.D_Riedijk · November 10, 2009, 9:09pm

The SDK has samples on both reduction & atomic functions. (You should do AtomicInc(&global_memory_element, 10);)

Topic		Replies	Views
Newbie Question: Threads What's going on here? CUDA Programming and Performance	5	2303	July 18, 2008
Many threads updating a single global variable CUDA Programming and Performance	7	6929	March 30, 2012
Increment a device variable each time a kernel function gets executed. CUDA Programming and Performance	3	2123	April 6, 2012
memory operation inside kernel simultaneous using of variable inside the kernel function CUDA Programming and Performance	2	3136	June 11, 2011
How to use multithread to accumulate one variable CUDA Programming and Performance	3	2276	October 24, 2008
how can multithreads modify the common variable in device memory? CUDA Programming and Performance	5	830	February 3, 2014
Thread memory concurrency within the same block? CUDA Programming and Performance	12	1598	September 29, 2010
problem about the GPU thread CUDA Programming and Performance	2	1081	May 7, 2009
Increment Global __device__ Issue CUDA Programming and Performance	2	2176	June 24, 2008
Atomicinc with independent thread scheduling CUDA Programming and Performance	2	676	May 19, 2021

Is it possible to increment a variable by different threads at the same time ?

Related topics