cudaMallocPitch does not work properly. This is the code segment.

 size_t pitch; 
 A_BD.numBlock=156800;  //156800 is a multiple of 64.
 cudaMallocPitch((void**)&A, &pitch, A_BD.numBlock*sizeof(float), 9);

However, it does not work correctly. pitch=627200 after the cudaMallocPitch() and memory clashes.
But the following works properly

 cudaMalloc((void**)&A, pitch*9*sizeof(float));

It seems that I does not call cudaMallocPitch() correctly. But What’s wrong? Thank you.

I see nothing wrong at all.

Do you want to allocate A_BD.numBlock*sizeof(float) X 9 bytes on global memory?

After calling cudaMallocPitch();

pitch = 156800 * sizeof(float) = 627200 and 627200 is a multiple of 64 so 627200 is a correct number.

What is wrong?