Newbie Question: Threads What's going on here?

nickpcuda · July 18, 2008, 1:54pm

I’m new to CUDA, and just wanted to write a simple program that allocate memory for an integer on the device. Then runs a kernel program with grid size 256 and 256 threads per block.

test_function<<< 256, 256 >>>(value);

__global__ void test_function(unsigned int *value)

{

	*vale = *value + 1;

	

}

I would expect the variable i to be incremented 256*256 times giving a final value of 65536. But when I read it’s value back to the host it’s 22?

Any ideas? Or have I miss understood something?

cbuchner1 · July 18, 2008, 2:08pm

This addition is not an atomic operation. Let me explain this with 2 threads

thread 1 reads out the value x

thread 2 reads out the value x

thread 1 increments x by 1

thread 2 increments x by 1

thread 1 writes x + 1

thread 2 writes x + 1

Result is incorrect, expected was a result of x+2

You tried to apply a serial programming concept to a parallel machine, resulting in breakage.

Christian

StickGuy · July 18, 2008, 2:09pm

There’s no guarantee that all threads will read/write to value sequentially. In fact, it’s almost guaranteed that they won’t, as you’ve experienced in your example. Furthermore, it’s only possible to synchronize threads within the same block. For inter-block synchronization, you need to relaunch kernels.

nickpcuda · July 18, 2008, 2:10pm

Cool thanks. That makes perfect sense. Would the correct way to do it be to use a mutex to lock the variable, then inc, then unlock? Or does CUDA has a different way of doing this?

MisterAnderson42 · July 18, 2008, 2:10pm

You have to imagine as if all threads are running simultaneously. If they all read value at the same time, increment their copy and then write the temporary back, *value should read 1 :)

It is 22 in your case because not all threads actually run simultaneously, so some will run after others and increment the value there. If you run your app multiple times, you should randomly get different values because threads aren’t scheduled the same way every time.

MisterAnderson42 · July 18, 2008, 2:14pm

Heh, guess someone beat me.

There are no mutexes or such in CUDA. With ~20,000 threads running concurrently on the device, anything that serializes the execution would absolutely destroy the performance anyway. CUDA is best for data-parallel algorithms where each thread operates independently on its own piece of data (i.e. one array element).

For tasks like summing an array, you can perform a reduction (see the SDK example and whitepaper).

If you absolutely must have multiple threads access the same variable (knowing the performance implications if too many threads access the same variable at once…) you can use Atomic operations. See the programming guide for all the various atomic operations. Using them requires compiling for the sm11 or later architecture and a compute 1.1 or newer card (anything but the 8800 GTX/Tesla 800 series).

Topic		Replies	Views
Is it possible to increment a variable by different threads at the same time ? CUDA Programming and Performance	3	1961	November 10, 2009
Increment a device variable each time a kernel function gets executed. CUDA Programming and Performance	3	2128	April 6, 2012
How to use multithread to accumulate one variable CUDA Programming and Performance	3	2277	October 24, 2008
Parallel processing of same memory address(es) CUDA Programming and Performance cuda	3	420	August 21, 2020
memory operation inside kernel simultaneous using of variable inside the kernel function CUDA Programming and Performance	2	3137	June 11, 2011
how can multithreads modify the common variable in device memory? CUDA Programming and Performance	5	837	February 3, 2014
Atomicinc with independent thread scheduling CUDA Programming and Performance	2	682	May 19, 2021
data copy from each threads and blocks. CUDA Programming and Performance	1	792	October 12, 2009
simple cdot implementation beginner question CUDA Programming and Performance	4	5856	August 21, 2007
Thread memory concurrency within the same block? CUDA Programming and Performance	12	1599	September 29, 2010

Newbie Question: Threads What's going on here?

Related topics