When we do shared memory padding for bank conflict resolution, do we have to use that, or do we just add an extra space and keep it that way?
For instance, the following kernel has an extra column. Do we take into account that column when we are operating on matrix elements? Or do we just consider it wasted space?
You add an extra column and typically that extra column would be “unused”. Typically, you consider it wasted space. here is an example. If you study the code there in the last kernel
transposeCoalesced listing, while substituting the shared memory line as given in the final example, the code would never index into the last column. That should be trivial to verify because neither
threadIdx.y for a 32x32 threadblock would ever exceed 31.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.