For a vector addition, assume that the vector length is 2000, each thread
calculates one output element, the thread block size is 512 threads and the entire computation is carried out by one grid. How many threads will be in the grid?
Answer: The answer is 2048 = (4 * 512)
How many warps do you expect to have divergence due to the boundary check on the vector length?
My issue is that I don’t know the concept of warp divergence.
I would be happy to help to understand the concept and also solve the problem.
warp divergence results from a situation where you have conditional code, and some threads in the warp follow one path through the conditional code, and other threads in the warp follow a different path.
The boundary check referred to in your prompt is an example:
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < N)
z[idx] = x[idx] + y[idx];
For threads whose idx value (the globally-unique thread index) is less than N, they will perform the vector addtion. Those threads whose idx value is equal to or greater than N will not. Typically there would be one warp in the grid where some threads would satisfy that boolean condition, and some threads would not. All other warps in the grid would either consist of all threads satisfying the condition, or all threads not satisfying the condition. The warp that has some of both is the candidate for warp divergence.
The concept of warp divergence has a number of possible interpretations and nuances that I am not covering here. This is a basic definition that is suitable for initial understanding and solution of the question asked.
Thank you. A question now raised in my mind, that is NVIDIA’s GPU runs a warp of 32 threads. In this case, there will be one warp that is completely idle (32) and another with (48-32) idle thread. So, isn’t the answer 2?
Did you review what my definition of candidate warp divergence was? You need threads, in a single warp, some of which (one or more) are following one path (let’s call it the “if” path"), and some (one or more) are following the other path (let’s call it the “else” path). If all threads in a warp are idle, it means they all followed the “else” path, and therefore that warp is not a candidate for warp divergence.