I am trying to copy a device buffer into another device buffer. I tried to use cudaMemcpy2D because it allows a copy with different pitch: in my case, destination has dpitch = width, but the source spitch > width. It seems that cudaMemcpy2D refuses to copy data to a destination which has dpitch = width. I am new to using cuda, can someone explain why this is not possible? Using width-1 bytes works, but the rightmost pixels are incomplete (as expected). Using cuda 11.3 on Ubuntu 20.04.
In my scenario, the src is allocated by cv::cuda::GpuMat() and the dst is from cudaMalloc().
The src buffer is a RGB image.
cudaMemcpy2D(dst_ptr, dst_step, src_ptr, src_step, width * CV_ELEM_SIZE(cv_type), height, cudaMemcpyDeviceToDevice) );
When dst_step = width = 1800, src_step = 2048, I get the error invalid pitch argument.
If using width = dst_step = src_step = 2048, the error goes away and the image is copied without problems.
Does cudaMemcpy2D needs the pitch values to be a multiple of 32 / 64 ?
Either way, where can I find such limitations? I did not find it in the cuda reference doc.