An exercise from Programming Massively Parallel Processors book

We want to use each thread to calculate two elements of a vector addition.
Each thread block processes 2*blockDim.x consecutive elements that form
two sections. All threads in each block will first process a section first, each
processing one element. They will then all move to the next section, each
processing one element. Assume that variable i should be the index for the
first element to be processed by a thread. What would be the expression for
mapping the thread/block indices to data index of the first element?

A. i=blockIdx.x*blockDim.x + threadIdx.x +2;
B. i=blockIdx.x*threadIdx.x*2;
C. i=(blockIdx.x*blockDim.x + threadIdx.x)*2;
D. i=blockIdx.x*blockDim.x*2 + threadIdx.x;

Can someone clarify why the answer is D?

|                         total    block x elements                  |                         total    block x+1 elements                |
|   blockDim                          |       blockDim               |    blockDim                          |       blockDim              | 
|   first section                     |       second section         |    first section                     |       second section        | 
| x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x| x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x|

Think of the left side of the above diagram as the set of elements that will be processed by a single block. There are many sets of these elements, adjacent to each other in memory, comprising our entire data set. We will use multiple thread blocks to handle the entire data set.

We can see that the total number of elements processed per block is twice the block dimension. Therefore, if I have a block index that is numbered consecutively across blocks (0, 1, 2, etc.) I will need to multiply that block index by twice the number of threads per block (= the number of elements per block) in order to get to the starting point of the elements to be processed for that block.

Once I have multiplied the block index by twice the number of threads per block, I can then have each thread choose a separate element, by adding the thread index to the previously computed starting point. Each thread will then select one element from the “first section” area of the diagram above, and that is the prompt given. The arithmetic which matches this text description is D.

1 Like