trouble with cudaMemcpy2D I cant get a matrix to copy into 2D pitched memory

UPDATE: I fixed it. The problem was that the pitch returned from cudaMallocPitch is in bytes so to fix it you can divide by the size of the dataType and multiply by the size of dataType when doing mem copies.

I am trying to copy an array from the host into 2D device memory. The currently the data is copied but the padding is wrong. I tried reading the reference manual and I think I passed the correct parameters. They’re both square matrices(dimension x == dimension y) Here is what I have.



checkCUDAError("Memcpy 2D");

d_mat2 is the matrix on the device here is the declaration

cudaMallocPitch((void **)&d_mat2,&pitch2,memWidth,dim);

pitch2 is the pitch I got when using cudaMalloc2D

mat2 is the matrix to be copied (allocated as a dynamic one dimensional array type double)

memWidth is the size of double times dim (the dimension)

dim is the dimension

I’m not seeing anything wrong with that code.


#include <stdio.h>

int main(){

const int dim = 16;

double* d_mat2;

size_t pitch2;

unsigned int memWidth = dim*sizeof(double);

double* mat2 = (double*)malloc(dim*dim*sizeof(double));

for (unsigned int i=0;i<dim*dim;++i)

	mat2[i] = i;

cudaMallocPitch((void **)&d_mat2,&pitch2,memWidth,dim);



for (unsigned int i=0;i<dim*dim;++i)

	mat2[i] = 0;



bool passed = true;

for (unsigned int i=0;i<dim*dim && passed;++i)

	passed = (mat2[i] == i);

fprintf(stderr,"TEST = %s\n",passed? "PASSED" : "FAILED");





This test passes on my machine.