Usage of cudaMalloc3dArray and cudaMemcpy3d for 2D Arrays

Environment: CUDA 10.2

I am current allocating a 2D array using the following call:

  cudaArray_t array;
  cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

  cudaExtent extent;
  extent.width = array_dim;
  extent.height = array_dim;
  extent.depth = 0;
  unsigned int flags = 0;
  cudaMalloc3DArray(&array, &channelDesc, extent, flags);

I expect this to output a 2D cuda Array after reading the following docs:

cudaMalloc3DArray() can allocate the following:

  • A 1D array is allocated if the height and depth extents are both zero.
  • A 2D array is allocated if only the depth extent is zero.

I would expect that calling cudaMemcpy3d on this array (with cudaExtent.depth = 0) would copy over data properly but I don’t think data is copied over correctly.

When I replace the cudaMemcpy3d calls with cudaMemcpy2d calls, I see the data is copied correctly. Is this expected behavior with cudaMemcpy3d (fails to copy anything if extent.depth = 0 and source array is created with extent.depth = 0)

You cannot use a depth of 0 on cudaMemcpy3D

See here.

What is the correct way to interact with CUDA Graph when working with 2D CUDA Arrays. I am adding a memcpy node as follows:

int height = 4, width = 4;
auto desc = cudaCreateChannelDesc<float1>();
cudaArray_t array;
cudaMallocArray(&array, &desc, width, height, 0);

cudaMemcpy3DParms copyParams;
copyParams.extent.height = 4;
copyParams.extent.width = 4;
copyParams.extent.depth = 1;
copyParams.kind = cudaMemcpyHostToDevice;
copyParams.dstArray = array;
copyParams.srcPtr = some host pointer;

cudaGraphNode_t phGraphNode;
cudaGraphAddMemcpyNode(&phGraphNode, hGraph, dependencies, numDependencies, copyParams);

This will fail because array is not a 3D Array. However, the only other relevant function for create graph Memcpy nodes is cudaGraphAddMemcpyNode1D, which does not seem to work on pitched 2D arrays.

How should I create CUDA Graph Memcpy Nodes that support 2D CUDA Arrays?

As you have already pointed out:

So do that instead of this:

That is what you had in your original posting:

Then you should be able to do your 3D memcpy with depth = 1, and it should work.