:huh: Could anybody help me on what the “Pitch” is and how to use it in cudaMallocPitch()? Yes, we do need memoryPointer, width and height. But, I vaguely get the “Pitch” part. Why don’t they simply use nrows and ncols? Any comment will be appreciated.
Pitch is the padded size of each “row” in the array.
If you have an array that has 12 float rows, CUDA runs faster if you pad the data to 16 floats:
[ X X X X X X X X X X X X _ _ _ _ ]
The data is 12 floats wide, the padding is 4 floats, and the pitch is 16 floats. (Or 64 bytes, as cudaMallocPitch sees it.)
Thank you, kristleifur!
How do they calculate the padding? Should we assign the padding value in the kernel parameter? Is it also something to do with the following in the 2.0 Programming Guide (p.61)?
"[i]Second, global memory bandwidth is used most efficiently when the simultaneous
memory accesses by threads in a half-warp (during the execution of a single read or
write instruction) can be coalesced into a single memory transaction. The size of a
memory transaction can be either 32 bytes (for compute capability 1.2 and higher
only), 64 bytes, or 128 bytes.[/i]"
I think it has exactly to do with the Programming Guide section you found! It’s calculated so it aligns up to 32 or 64 or 128 bytes. Usually 64 bytes. Almost always 64 bytes, in my code.
And yes, the kernel must know the pitch. It’s usually easier to work with the “aligned” pitch numbers, so you can index faster into the array anyway.