Does cudaMallocPitch zero pad arrays on the device or should I be doing that?

airthimble · January 20, 2013, 11:49pm

Hello,

I’ve been studying CUDA the last few days, I have a question about how most people work with 2D arrays.
I am currently just attempting to implement some simple kernels for practice.

Let’s say I would like to multiply two non-square matrices in a similar style to the example on page 27 of the CUDA_C_Programming_Guide. The example assumes that the matrices width and height are both multiples of the block size(using a square block).

I would like to generalize this and multiply matrices that have width and height that are not multiples of the block size. My first attempt was to expand the matrices and pad them with zeros in order to make the dimensions multiples of the block size.

I also noticed in the CUDA_C_Programming_Guide that there is a cudaMallocPitch function that will return a pitch that will maximize coalesced memory reads.

I suppose my questions are:

For 2D arrays I just “flatten” them and use a 1D array, is this good practice?
Is it good practice to zero-pad my arrays so that height and width are multiples of the block size, then use cudaMallocPitch? Or can I just use cudaMallocPitch and it will take care of this for me?
In my simple kernels(matrix add, matrix multiply), often the first thing I do is use the different block and thread variables to calculate an X,Y coordinate. Then I check if this X,Y is within the bounds of my output matrix. This can cause the blocks on the edges of the matrix to have multiple flow paths, is it standard practice to skip this check and just assume the data is multiples of the block size?

Hopefully you can see what I am trying to get at. Thanks in advance for your responses!

EDIT:

I am fairly certain I understand how the concept of pitch works. But am still padding my matrices so that they are multiples of the blockSize in order to avoid having conditional logic in my kernels. I’m wondering if this is better than having conditional logic in the kernel code?

Topic		Replies	Views
Significance of Pitch for Allocation of 2D Arrays CUDA Programming and Performance	3	2093	June 30, 2009
matMul in Guide CUDA Programming and Performance	1	2654	February 1, 2009
"Pitch" in cudaMallocPitch()? CUDA Programming and Performance	3	4672	March 2, 2009
using cudaMallocPitch Legacy PGI Compilers	1	4477	January 10, 2011
purpose of padding and cudaMallocPitch() CUDA Programming and Performance	0	632	May 10, 2014
Multiplying arbitrary sized matrices CUDA Programming and Performance	3	2150	February 2, 2010
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6612	November 20, 2011
cudaMallocPitch CUDA Programming and Performance	5	4613	October 5, 2010
2D matrix addition question CUDA Programming and Performance	7	12341	May 17, 2009
cudaMallocPitch : pitch size Will the pitch be different for two arrays with same dimension ? CUDA Programming and Performance	0	772	April 15, 2011

Does cudaMallocPitch zero pad arrays on the device or should I be doing that?

Related topics