cudaMemcpyToArray is deprecated

Suppose we have a cudaArray allocated with cudaMallocArray. It is inherently a 2D allocation, consisting of a width and height (in elements):

__host__ cudaError_t cudaMallocArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, size_t width, size_t height = 0, unsigned int  flags = 0 )

Let’s suppose our width is w and our height is h. Now let’s suppose we want to transfer data to this array. In the general case, we may wish to transfer data to a specific region within this array, the size of which is called extent. Lets say that the origin of that region is (x,y), the width of that region is sw, and the height of that region is sh. Like this:

To formulate a proper cudaMemcpy2DToArray operation to this destination region in the cudaArray dst from a 2D unpitched host memory allocation src, it would look like this:

cudaMemcpy2DToArray(dst, x, y, src, sw*sizeof(src[0]) , sw*sizeof(src[0]), sh, cudaMemcpyHostToDevice);

Regarding “2D unpitched host memory allocation”, that is effectively a linear allocation of memory. 2D refers to the idea that it logically represents a 2D region, consisting of rows and columns. However it may ordinarily be a 1D allocation, where the “rows” are adjacent to each other. In addtion, the assumption for the above case is that the source allocation effectively has a width of sw, i.e. the entire source allocation corresponds to a data set the size and shape of the green region above.

For the general case of a pitched source (host) allocation, the only difference would be choice of src pointer to point to the beginning of the source region, and modification of the host pitch parameter.

From the above description, it is evident that a pitch is involved when using this api as a replacement for cudaMemcpyToArray (which is deprecated).

Going back to the original question then, when refactoring code that is currently in a realization like this:

cudaMemcpyToArray(dst, 0, 0, &vector[0], nSamples*sizeof(float), cudaMemcpyHostToDevice)

it will be necessary to know the width associated with the cudaArray dst. The source line pitch parameter (as well as transfer column width) associated with the cudaMemcpy2DToArray operation must be consistent with (i.e. less than or equal to) the width of the cudaArray (we are considering both widths in elements for this comparison statement, although the widths associated with the cudaMemcpy2DToArray operation are expressed in bytes).

If the width of the cudaArray is not known by inspection of the code, it can be ascertained in-situ at runtime using the cudaArrayGetInfo api:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g373dacf191566b0bf5e5b807517b6bf9

1 Like