Each thread working concurrently ?

motto_mt · March 2, 2010, 3:45pm

How can I know that each thread in thread blocks is working concurrently ?

I defined thread block with dimension 4 x 4 and each thread block has 3 x 10 threads. I think that the result may show concurrently (suppose that: 100 lines by one time ) but my result show one line by one time.

How I ensure that it is working concurrently and how can I know ?

jjp · March 2, 2010, 3:59pm

You can’t. The only assumption you can make is that threads belonging to the same warp will be executing concurrently, e.g. threads 0-31, 32-63,…, 480-511 will execute concurrently if the size of a warp is 32.

YDD · March 2, 2010, 4:01pm

Can you be a bit more specific? Your question has a number of possible different answers, dependent on the level of detail/understanding you require.

In the absolute sense, the threads are obviously not executing concurrently. A threadblock is basically a virtual multiprocessor, but the real multiprocessors only have 8 streaming processors. Various pipelining considerations mean that groups of 32 threads (called a ‘warp’) will always appear to the programmer to run concurrently - if you have a race condition within a warp, it’s impossible to predict which thread will ‘win’ the race. However, the hardware scheduler makes no guarantees about the order in which warps within the same block are executed, unless a [font=“Courier New”]__syncthreads()[/font] (or similar) command is present. So unless your code uses those calls, then your program should treat all threads within a block as running concurrently. Even those commands only guarantee consistency at a single point in the code - which warp leaves the [font=“Courier New”]__syncthreads()[/font] first is not defined.

motto_mt · March 2, 2010, 4:21pm

Thanks for your answer.

This is my first time to coding CUDA programming. I’m quiet poor in English skill, sorry for that.

I’m not sure about my code is managed by warp. I defined

dim3 threadsPerBlock(16,4);
dim3 threadsPerGrid(4,4);

I don’t understand that how can I manage my program with warp or it automatically divide thread to warp form ?

avidday · March 2, 2010, 4:37pm

A warp of threads is the basic scheduling and execution unit inside the GPU. It isn’t something the programmer has any control over, other than being the only scale at which execution coherence is implicitly guaranteed by the execution model. You cannot know what order blocks are executed in, and you cannot know what order warps within blocks are executed in, but you can assume that each thread within a warp of 32 threads are executed coherently, so threads (0…31) are executed together, (32…63), etc. Nothing else is guaranteed or predictable.

motto_mt · March 2, 2010, 4:46pm

Thanks for your answer.

Topic		Replies	Views
Scheduling individual threads CUDA Programming and Performance	4	4580	June 1, 2009
finding the best number of threads per block CUDA Programming and Performance	3	7856	January 29, 2010
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2180	March 19, 2011
question about warp, block and threads CUDA Programming and Performance	4	2006	February 3, 2009
CUDA hardware level: Streaming Multiprocessor CUDA Programming and Performance	1	2642	April 27, 2015
Threads vs Blocks How does one achieve maximum parallelism? CUDA Programming and Performance	1	1024	April 2, 2010
Distribution of Threads to Multiprocessors CUDA Programming and Performance	8	13614	June 8, 2011
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1657	January 23, 2009
Parallel thread processing in a warp CUDA Programming and Performance	5	3712	July 17, 2009
A question the parallelization CUDA Programming and Performance	5	2696	July 29, 2008

Each thread working concurrently ?

Related topics