help with cudaMemcpy2D I can't get a matrix/ array to copy correctly from host to device

I am trying to copy an array from the host into 2D device memory. The currently the data is copied but the padding is wrong. I tried reading the reference manual and I think I passed the correct parameters. They’re both square matrices(dimension x == dimension y) Here is what I have.

cudaMemcpy2D(d_mat2,pitch2,mat2,memWidth,memWidth,dim

										 ,cudaMemcpyHostToDevice);

checkCUDAError("Memcpy 2D");

d_mat2 is the matrix on the device here is the declaration

cudaMallocPitch((void **)&d_mat2,&pitch2,memWidth,dim);

pitch2 is the pitch I got when using cudaMalloc2D

mat2 is the matrix to be copied (allocated as a dynamic one dimensional array type double)

memWidth is the size of double times dim (the dimension)

dim is the dimension

I quickly read your code, Nothing wrong at all. but you may consider about the dim, memWidth and the size of matrix allocated in host.
Does you card support double precision?

int memWidth = sizeof(double) * dim;
cudaMallocPitch((void **)&d_mat2, &pitch2, memWidth, dim);

cudaMemcpy2D(d_mat2, pitch2, mat2, memWidth, memWidth, dim, cudaMemcpyHostToDevice);

UPDATE: I fixed it. In cudaMallocPitch the returned pitch is for bytes. So when addressing you should go for the byte address or you can divide the pitch by the size of dataType and when doing mem-copies you multiply the pitch by size of dataType and everything will align correctly.

my card does support double precision so I don’t know why I still get the error

To tell nvcc compiler supports double precision arithmetic you must set “-arch sm_13” in the compile-command line option, default, nvcc compile with single-precision arithmetic.