cudaMallocPitch is giving inconsistent result cudaMallocPitch is giving inconsistent r

//refernce code
char I_frame_ptr,P_frame_ptr;
size_t I_frame_pitch=0,P_frame_pitch=0;
cudaError_t a,b;
a = cudaMallocPitch ((void
) &I_frame_ptr, &I_frame_pitch, ROW_PIXELsizeof(char),COLUMN_PIXEL);
b = cudaMallocPitch ((void**) &P_frame_ptr, &P_frame_pitch, ROW_PIXEL
sizeof(char),COLUMN_PIXEL);
printf ("%d\t%d\n",I_frame_pitch,P_frame_pitch);
printf(cudaGetErrorString(a));
printf(cudaGetErrorString(b));
//CODE ends

When I define ROW_PIXEL as 640, it is printig I_frame_pitch and P_frame_pitch both as 640, moment I increase the ROW_PIXEL to 656 it prints 704, furher I changes ROW_PIXEL to 720 it prints 768. Again when I set ROW_PIXEL as 1024 it prints 1024…I cannot understand what is happening here. I want it to work correctly for 720 !

There is no problem having a different pitch anyway. You can safely use the pitch returned by cudaMallocPitch to access elements in your array.

The problem you are getting a different pitch may be because of implicit padding done by cudaMallocPitch. Programming guide states about this in section

4.5.2.3

Note: Also check the memPitch value returned by cudaDeviceProp function during cuda initialization.

Everything is ok except 656->704 …Check again maybe you did a typo.

Pitch represents new width to satisfy alignment requirements. Yor image of 720 bytes (i said bytes because you use sizeof(char)) in width must be extended to 768 bytes because width must be the product of 128. Or by other words

width % 128 == 0

Imagine that like your 2D image is stored in 1D memory buffer as sequence of rows.

First row followed with second row and so on.

To achieve maximal performance, beginning of each row must be at aligned address in that buffer. It is possible only if width%128==0 but it is not the case with 720 bytes, so extra 48 bytes are inserted after each row. Such new pitched width is in Pitch variable.

Accesing byte from coordinate (x,y) is easy

if(x<720) YourByte = buffer[y * Pitch + x]

Condition is used to allow processing only of bytes from image and not inserted.

Pixelsize is 1 in your case otherwise line would be

if(x<ImageWidth) YourByte = buffer[y * Pitch + x * PixelSize]

I think even 704 is an OK pitch:

704 / 16 = 44

704 / (16 * 4) = 11

so it’s aligned

It is true if CUDA alignment requirements is 64 and not 128 (I’m not sure about that can someone confirm).

Then it means every width where

width % 64 != 0

must be enlarged to the first larger value which could be divided by 64 and that value is returned in Pitch variable.

However, Punit got a picture why Pitch can not be 720 as he expected.

I’m not sure, but I think it can depend on what compute capability. The cards I’ve worked with have coalcesced access to 16 * 4-word patterns = 64 bytes. AFAIK. 128 may be safer for future cards or something like that.