I am trying to copy an array from the host into 2D device memory. The currently the data is copied but the padding is wrong. I tried reading the reference manual and I think I passed the correct parameters. They’re both square matrices(dimension x == dimension y) Here is what I have.
I quickly read your code, Nothing wrong at all. but you may consider about the dim, memWidth and the size of matrix allocated in host.
Does you card support double precision?
UPDATE: I fixed it. In cudaMallocPitch the returned pitch is for bytes. So when addressing you should go for the byte address or you can divide the pitch by the size of dataType and when doing mem-copies you multiply the pitch by size of dataType and everything will align correctly.
my card does support double precision so I don’t know why I still get the error
To tell nvcc compiler supports double precision arithmetic you must set “-arch sm_13” in the compile-command line option, default, nvcc compile with single-precision arithmetic.