UPDATE: I fixed it. The problem was that the pitch returned from cudaMallocPitch is in bytes so to fix it you can divide by the size of the dataType and multiply by the size of dataType when doing mem copies.

I am trying to copy an array from the host into 2D device memory. The currently the data is copied but the padding is wrong. I tried reading the reference manual and I think I passed the correct parameters. They’re both square matrices(dimension x == dimension y) Here is what I have.

```
cudaMemcpy2D(d_mat2,pitch2,mat2,memWidth,memWidth,dim
,cudaMemcpyHostToDevice);
checkCUDAError("Memcpy 2D");
```

d_mat2 is the matrix on the device here is the declaration

```
cudaMallocPitch((void **)&d_mat2,&pitch2,memWidth,dim);
```

pitch2 is the pitch I got when using cudaMalloc2D

mat2 is the matrix to be copied (allocated as a dynamic one dimensional array type double)

memWidth is the size of double times dim (the dimension)

dim is the dimension