Hi,
I am having trouble with the 65536 pitch limit of the cudaMemcpy2D() function. I allocate a matrix that is very wide (>1000000) but not very high (<10) with cudaMallocPitch. This allocation gives no errors. Then (after the kernels finish) i would like to copy only the first row of the matrix back to host memory. Is there any trick to do this without cudaMemcpy2D()?
Kind regards,
Daniel Dekkers