What are row alignments for 2D arrays used for?

michaelrrolle45 · October 11, 2019, 1:22am

If I have linear device memory arranged as a T[H][W] array (for some type T), when is it necessary or advantageous to add row padding so that is actually a T[H][W+pad] arrangement?
One case I know of is if I am using Memcpy2D. It requires that the pitch, which is = sizeof T[W+pad] be a multiple of 512. This is true on my Kepler GPU, and could be different on other architectures. Memcpy2DUnaligned lacks this restriction, but may run slower. MemcpyD2D32, etc., will be OK but may run slower. So if I want to use any of these functions, I would want to use MemAllocPitch to allocate the memory.
Another case is cuTexRefSetAddress, where the both the memory address and pitch have to be appropriately aligned.
Other than these, are there any other cases where padding is either required or is faster?
If a warp is accessing a row of the memory, and it is not aligned, then the references may span extra lines of the cache, but I don’t expect that this is a performance hit. Please correct me if I’m wrong.

If I am using unified memory, I don’t know if these requirements or performance advantages still apply. Since the driver can access the device memory as host virtual addresses, it might not be an issue, and in fact having no gap between rows might be faster performance. I’ll experiment with this on my Kepler and update this post with the results. I found that memcpy2D and memset2D don’t require pitch alignment for unified memory.

BTW, can I safely use CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT to determine the alignment of the pitch in all cases?

njuffa · October 11, 2019, 1:35am

Distinguishing between 2D array dimensions and 2D storage dimensions is always useful when dealing with sub-matrices of a larger matrix. There are different names used for the storage dimension, such as “pitch” (often used in graphics) or “leading dimension” (in BLAS). So cudaMemcpy2D() has two pitch parameters to allow the copying of a sub-matrix of the source to be transferred to a same-size sub-matrix of the destination:

source                       dest
+---------------------+     +----------+
|                     |     |          |
|   +-+               |     |          |
|   | |  >. . . . . . |. . .|. . . .   |
|   +-+               |     |      .   |
|                     |     |      v   |
|                     |     |          |
+---------------------+     |     +-+  |
                            |     | |  |
                            |     +-+  |
                            +----------+

Topic		Replies	Views
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6615	November 20, 2011
Significance of Pitch for Allocation of 2D Arrays CUDA Programming and Performance	3	2093	June 30, 2009
cuMemallocPitch for 3D allocations? CUDA Programming and Performance	2	7110	June 23, 2008
Understanding Memory Pitch Alignment CUDA Programming and Performance	9	12379	October 13, 2015
How to allocate pitched unified memory? CUDA Programming and Performance	4	817	October 11, 2019
cudaMemcpy2D example? CUDA Programming and Performance	5	19858	February 1, 2012
memcpy2d question Question on memcpy2d implementation CUDA Programming and Performance	3	1809	October 3, 2008
Pitch linear memory CUDA Programming and Performance	6	14781	August 10, 2011
Documentation for cuMemcpy2DUnaligned might be wrong and could be a bit better Cuda Driver API Memor CUDA Programming and Performance	0	1061	June 18, 2011
2D Texture Combined breaks with Pitch memory CUDA Programming and Performance	4	1185	October 1, 2010

What are row alignments for 2D arrays used for?

Related topics