I have to transpose a 3D volume in a specified direction.
You can think of it as a cube that have to be rotated to the left, or to the front.
I’m using the same implementation as the 2D transpose in the sdk , and using a block of 8x8x8 (= 512 which is the max).
As the blocks are of size 8, I was wondering if the reads and writes are coalesced, and if not, is there a way to coalesce this ?
Another general question :
if the width of a 2D image is not a multiple of 16, the begining of the memory the blocks accees won’t be (begin + n16) but (begin + n16 + m*width), so HalfWarpBaseAdress-BaseAdress won’t be a multiple of 16, is that right ? So will the reads and writes still be coalesced ?