Use cudaMemcpy3DAsync for 2D cudaArray_t

Hi,

I want to use an asynchronous version of cudaMemcpy2DArrayToArray for device to device memory copy. cudaMemcpy3DAsync should be able to work on 2D CUDA arrays.

My code looks like the following:

cudaMemcpy3DParms params = {0};
params.srcArray = input_array_;
params.dstArray = output_array_;
params.extent = make_cudaExtent(width_, height_, 0);
params.kind = cudaMemcpyDeviceToDevice;
cudaMemcpy3DAsync(&params, cuda_stream_);

Input and output CUDA arrays have 2 channels and are created using cudaMalloc3DArray with the same extent.

cudaChannelFormatDesc channel_descriptor = cudaCreateChannelDesc(
    8, 8, 0, 0, cudaChannelFormatKindUnsigned);

cudaExtent extent = {0};
extent.width = width;
extent.height = height;
extent.depth = 0;
cudaMalloc3DArray(&cuda_array_, &channel_descriptor, extent);

I tried depth extent of 0 and it doesn’t seem to copy anything; depth 1 fails with error “invalid argument” at cudaMemcpy3DAsync.

How should I set the depth of the extent so it copies the 2D CUDA arrays correctly? From the cudaMalloc3DArraydescription, a 2D array is allocated if only the depth extent is zero.

Thanks.

However, cudaMemcpy3DAsync is going to require a transfer depth extent of 1, minimum:

cudaMemcpy3DAsync() copies data betwen two 3D objects.

I don’t seem to have any trouble with what you have shown with depth of 1:

$ cat t1989.cu

int main(){


cudaArray_t cuda_array1, cuda_array2;
cudaChannelFormatDesc channel_descriptor = cudaCreateChannelDesc(
    8, 8, 0, 0, cudaChannelFormatKindUnsigned);
int width,height;
width = height = 1024;
cudaExtent extent = {0};
extent.width = width;
extent.height = height;
extent.depth = 1;
cudaMalloc3DArray(&cuda_array1, &channel_descriptor, extent);
cudaMalloc3DArray(&cuda_array2, &channel_descriptor, extent);

cudaMemcpy3DParms params = {0};
params.srcArray = cuda_array1;
params.dstArray = cuda_array2;
params.extent = make_cudaExtent(width, height, 1);
params.kind = cudaMemcpyDeviceToDevice;
cudaMemcpy3DAsync(&params);

}
$ nvcc -o t1989 t1989.cu
$ cuda-memcheck ./t1989
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$

If you are still having issues, please provide a short complete example, just as I have done, rather than snippets. If you add an appropriately created stream argument to the above call, it also runs without error.

That worked thanks. I wrongly set the cudaMemcpy3DAsync width extent to be double of the width extent in CUDA array allocated, trying to account for the fact that there are 2 channels in the array.