NPP image pitch padding

norris.j · February 16, 2010, 3:42pm

Hey,

Although I haven’t fixed my previous post: http://forums.nvidia.com/index.php?showtopic=159782 . I’m making progress & this question is certainly related, hopefully someone can help…

Basically when I allocate an image on the GPU, using this call:

[codebox]/**

8bit unsigned, single-channel 2D (image) memory allocator.
\param nWidthPixels The width of the 2D array (image) to be allocated.
\param nHeightPixels The height of the 2D array (image) to be allocated.
\param pStepBytes The number of bytes between successive rows of pixels is returned via this pointer to int.
\return A pointer to the new 2D array (image). 0 (null-pointer) indicates that an error occurred
```
 during allocation.
```

*/

Npp8u * nppiMalloc_8u_C1(int nWidthPixels, int nHeightPixels, int * pStepBytes);[/codebox]

With a width of 1024, height 768, the pStepBytes is coming back with 1088, why is each line of my image being padded with 64 bytes (and thus crippling my algorithms to access pixels)? Under what circumstances is the pitch not (width * channels) ?

Thanks in advance,

James

CapJo · February 16, 2010, 4:07pm

Hey,

Although I haven’t fixed my previous post: http://forums.nvidia.com/index.php?showtopic=159782 . I’m making progress & this question is certainly related, hopefully someone can help…

Basically when I allocate an image on the GPU, using this call:

[codebox]/**
8bit unsigned, single-channel 2D (image) memory allocator.

\param nWidthPixels The width of the 2D array (image) to be allocated.

\param nHeightPixels The height of the 2D array (image) to be allocated.

\param pStepBytes The number of bytes between successive rows of pixels is returned via this pointer to int.

\return A pointer to the new 2D array (image). 0 (null-pointer) indicates that an error occurred
 during allocation.
*/

Npp8u * nppiMalloc_8u_C1(int nWidthPixels, int nHeightPixels, int * pStepBytes);[/codebox]

With a width of 1024, height 768, the pStepBytes is coming back with 1088, why is each line of my image being padded with 64 bytes (and thus crippling my algorithms to access pixels)? Under what circumstances is the pitch not (width * channels) ?

Thanks in advance,

James

Padding with 64 bytes from 1024 to 1088 is somewhat strange, since your data is already aligned to 64 byte.

In general your data should be allways padded to a muliple of 64 byte to achive efficient memory transfers. This means every new pixel line should start at a multiple of 64 byte. In such a case each half warp that access contiguous memory locations would need only one memory transfer.

Your memory accesses would need at least 2 memory transfers in the second line of pixels, if your pixel lines are not 64-byte aligned.

I don’t really know why it is padded in the case of 1024 pixels.

Your algorithms should be designed to work with padded data. It’s somewhat ugly, but you have no other option.

I use a padding in elements of my data type instead of the length in byte. This makes it less ugly, but not so portable, compared to pointer arithmetic. (Look into the NVIDIA programming reference).

norris.j · February 16, 2010, 4:30pm

Padding with 64 bytes from 1024 to 1088 is somewhat strange, since your data is already aligned to 64 byte.

In general your data should be allways padded to a muliple of 64 byte to achive efficient memory transfers. This means every new pixel line should start at a multiple of 64 byte. In such a case each half warp that access contiguous memory locations would need only one memory transfer.

Your memory accesses would need at least 2 memory transfers in the second line of pixels, if your pixel lines are not 64-byte aligned.

I don’t really know why it is padded in the case of 1024 pixels.

Your algorithms should be designed to work with padded data. It’s somewhat ugly, but you have no other option.

I use a padding in elements of my data type instead of the length in byte. This makes it less ugly, but not so portable, compared to pointer arithmetic. (Look into the NVIDIA programming reference).

Yeah re-writing my algorithms to cope with the pad is one way around, I guess that’s not really what I was going for. Thanks for the help, did not realise that about 64 byte alignment & I will consider it in future.

James

Frank_Jargstorff · February 18, 2010, 9:20pm

CapJo is right. It is generally a good idea to pad your lines to be multiples of 64 bytes in order for your alorithms to achieve coalescing. Without that padding it is almost impossibly hard to write code the perfectly coalesces it’s memory accesses.

NPP’s primitives all work with arbitrary linestrides (as long as the strides are multiples of the size of a single pixel). What that means is, that you’re not restricted to using NPP’s 2D memory allocators to allocate your image data. If you have kernels that cannot deal with arbitrary line strides, what you could do is, to use a normal cudaMalloc to allocate your image data with a padding of your choosing. NPP will be able to handle this, as long as your image data pointers are aligned to multiples of the pixel size and the line strides are also multiples of the pixel size.

The additional 64-byte padding is a somewhat obscure optimization that benefits some of the GPUs to achieve even better memory performance than 64-byte padded lines.

–Frank

norris.j · February 19, 2010, 4:15pm

CapJo is right. It is generally a good idea to pad your lines to be multiples of 64 bytes in order for your alorithms to achieve coalescing. Without that padding it is almost impossibly hard to write code the perfectly coalesces it’s memory accesses.

NPP’s primitives all work with arbitrary linestrides (as long as the strides are multiples of the size of a single pixel). What that means is, that you’re not restricted to using NPP’s 2D memory allocators to allocate your image data. If you have kernels that cannot deal with arbitrary line strides, what you could do is, to use a normal cudaMalloc to allocate your image data with a padding of your choosing. NPP will be able to handle this, as long as your image data pointers are aligned to multiples of the pixel size and the line strides are also multiples of the pixel size.

The additional 64-byte padding is a somewhat obscure optimization that benefits some of the GPUs to achieve even better memory performance than 64-byte padded lines.

–Frank

Thanks Frank, very helpful. I’m still slightly confused because the line width should already be padded to 64 bytes (1024%64=0). Although I’ve come to believe there’s a good case to just trust NPP & re-write my kernels to cope with this extra padding, which is what I’ve done! Still, very interesting read thanks.

James

Topic		Replies	Views
cudaMallocPitch is giving inconsistent result cudaMallocPitch is giving inconsistent r CUDA Programming and Performance	5	6322	June 28, 2008
"Pitch" in cudaMallocPitch()? CUDA Programming and Performance	3	4581	March 2, 2009
Significance of Pitch for Allocation of 2D Arrays CUDA Programming and Performance	3	2056	June 30, 2009
Alignment and cudaMallocPitch CUDA Programming and Performance	3	1927	July 15, 2008
CUDA 4.0 NPP giving wrong answers : NPP bug possibly ? CUDA Programming and Performance	3	1406	April 20, 2011
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6549	November 20, 2011
Pitch linear memory CUDA Programming and Performance	6	14626	August 10, 2011
question on memory coalescing and alignment CUDA Programming and Performance	0	1901	January 28, 2008
cudaMallocPitch() CUDA Programming and Performance	2	2654	October 21, 2009
Padding in Pitch memory CUDA Programming and Performance	2	4030	October 16, 2009

NPP image pitch padding

Related topics