I’m writting my Ph.D. and I’d like to give some keypoints about CUDA programmation. I’d like to give the exact definition of the alignment evoked in cudaMallocPicth.
But I’m not sure about this definition.
Can you help me?
pitch is the width in bytes of each “row” of 2D allocated memory.
I know what is a pitch :)
My question is “what is the alignment requirement?”
I think that I’ve found the solution. If my array address is noted A, the address of the first thread of a half warp (noted B) should be such as B-A is a multiple of 16*sizeof(float).
aha, I misunderstood your question.
Yes: the rules for coalescing basically boil down to thread i accesses bytes starting at base+i16sizeof(element) where element is either a 32, 64, or 128 bit type (I guess it could be 8 or 16 bits on compute 1.3 hardware, too).
The other requirement to satisfy for coalescing is the base address for the half-warp access, hence odd sized images need to be bumped up to a multiple of 16.