A Question from Programming Massively Parallel Processors: A Hands-on Approach

mahs1999 · September 23, 2021, 3:52pm

For a vector addition, assume that the vector length is 2000, each thread
calculates one output element, the thread block size is 512 threads and the entire computation is carried out by one grid. How many threads will be in the grid?
Answer: The answer is 2048 = (4 * 512)
How many warps do you expect to have divergence due to the boundary check on the vector length?

My issue is that I don’t know the concept of warp divergence.
I would be happy to help to understand the concept and also solve the problem.

Robert_Crovella · September 23, 2021, 4:01pm

warp divergence results from a situation where you have conditional code, and some threads in the warp follow one path through the conditional code, and other threads in the warp follow a different path.

The boundary check referred to in your prompt is an example:

int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < N)
    z[idx] = x[idx] + y[idx];

For threads whose idx value (the globally-unique thread index) is less than N, they will perform the vector addtion. Those threads whose idx value is equal to or greater than N will not. Typically there would be one warp in the grid where some threads would satisfy that boolean condition, and some threads would not. All other warps in the grid would either consist of all threads satisfying the condition, or all threads not satisfying the condition. The warp that has some of both is the candidate for warp divergence.

The concept of warp divergence has a number of possible interpretations and nuances that I am not covering here. This is a basic definition that is suitable for initial understanding and solution of the question asked.

mahs1999 · September 23, 2021, 4:28pm

Robert_Crovella:

warp divergence results from a situation where you have conditional code, and some threads in the warp follow one path through the conditional code, and other threads in the warp follow a different path.

The boundary check referred to in your prompt is an example:
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < N)
    z[idx] = x[idx] + y[idx];
For threads whose idx value (the globally-unique thread index) is less than N , they will perform the vector addtion. Those threads whose idx value is equal to or greater than N will not. Typically there would be one warp in the grid where some threads would satisfy that boolean condition, and some threads would not. All other warps in the grid would either consist of all threads satisfying the condition, or all threads not satisfying the condition. The warp that has some of both is the candidate for warp divergence.

Thank you. A question now raised in my mind, that is NVIDIA’s GPU runs a warp of 32 threads. In this case, there will be one warp that is completely idle (32) and another with (48-32) idle thread. So, isn’t the answer 2?

Robert_Crovella · September 23, 2021, 4:51pm

Did you review what my definition of candidate warp divergence was? You need threads, in a single warp, some of which (one or more) are following one path (let’s call it the “if” path"), and some (one or more) are following the other path (let’s call it the “else” path). If all threads in a warp are idle, it means they all followed the “else” path, and therefore that warp is not a candidate for warp divergence.

mahs1999 · September 24, 2021, 9:02pm

Thank you, now I clearly understood.

Topic		Replies	Views
Thread to warp assignement How block's threads get mapped to warps? CUDA Programming and Performance	4	7896	January 28, 2008
Question about control flow divergence CUDA Programming and Performance	4	7311	July 24, 2008
deviceQuery CUDA Programming and Performance	2	6069	March 29, 2010
Thread Divergence CUDA Programming and Performance	2	2730	September 27, 2008
Each thread working concurrently ? CUDA Programming and Performance	5	1117	March 2, 2010
Single Branch Divergence? [beginner question] CUDA Programming and Performance	3	1156	January 6, 2016
What's the reason for max. 512 threads per block ? CUDA Programming and Performance	14	9487	November 10, 2008
threads diverging in a loop when does a loop cause divergance CUDA Programming and Performance	13	20921	May 12, 2007
Thread size in a block should be multiple of warp size? CUDA Programming and Performance	4	6022	January 17, 2013
How many divergent branches can actually be discussed in parallel? CUDA Programming and Performance	5	3032	October 1, 2009

A Question from Programming Massively Parallel Processors: A Hands-on Approach

Related topics