cudaMemcpyToArray is deprecated

Hi all,

While compiling my code after installing CUDA 10.1 I’ve got the following compilation warning:

cudaError_t cudaMemcpyToArray(cudaArray_t, size_t, size_t, const void*, size_t, cudaMemcpyKind)’ is deprecated

I’ve looked at the documentation and didn’t find the replacement for this function.

Any clue?



It doesn’t appear to be well documented. Indication of deprecation is in the runtime API manual:

These functions were not listed this way in CUDA 10.0

The 2D functions are not deprecated, evidently. You should be able to use those as a replacement, I believe.

For example:


Would it be safe to assume they are being deprecated due to the fact that they’re redundant WRT 2D functions and hence the 2D functions are likely here to stay?


It seems plausible to me. When I was pondering this, that was my speculation. However it leaves the door open for questions I can’t explain at the moment, so its not entirely obvious to me what is going on here yet.

At the moment I can’t say anything definitively. I have already filed an internal request (bug) to have the documentation improved with respect to this, and can’t say anything definitive until that gets some traction.

This being the week before GTC, everyone is rather busy/distracted at the moment.

Since I have filed a bug, I won’t “forget” about this. If/when I see movement on the bug that allows me to comment here publicly with something more concrete, I will attempt to do so. Nevertheless, if we view this as a documentation issue, the earliest possible scenario to address it formally would be the next CUDA release, and currently I have no indication that any update here would be included in the next CUDA release.

Hi Robert, I wanted to ask if there are any news since March about the cudaMemcpyToArray deprecated function. Are there any official replacement function for those? Thanks in advance.

You should use non-deprecated functionality. That is the replacement. The 2D functions already mentioned are one such example.

Thanks. These have however different signatures. Is there an example in the documentation of how to perform that substitution?

For example, what would be the new version of:

cudaMemcpyToArray(dst, 0, 0, &vector[0], nSamples*sizeof(float), cudaMemcpyHostToDevice)

Is there a 1D function for it or do I really need a 2D function?

You can find an example in XPL source code if you would like (it’s open), here’s an example of doing what I think you wanted:

#include <xpl.cuh>
#include <vector>

using namespace xpl;

int main(int argc,char* argv[])

    int nSamples = 1024;
    std::vector<float> vector(nSamples);

    vector[0] = 42.0f;
    // Allocate a texture with 'nSamples' elements
    TextureBuffer<float> texture(nSamples);
    // Map vector 
    auto h_vector = HostBuffer<float>::map(&vector[0], Size(1,nSamples));
    // Perform memory copy to the underlying texture/ cudaArray:
    texture = h_vector;
    // print a value:
    std::cout << "\n Texture now contains: " << texture(0) << "\n";
    return 0;

Repo url:


Suppose we have a cudaArray allocated with cudaMallocArray. It is inherently a 2D allocation, consisting of a width and height (in elements):

__host__ cudaError_t cudaMallocArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, size_t width, size_t height = 0, unsigned int  flags = 0 )

Let’s suppose our width is w and our height is h. Now let’s suppose we want to transfer data to this array. In the general case, we may wish to transfer data to a specific region within this array, the size of which is called extent. Lets say that the origin of that region is (x,y), the width of that region is sw, and the height of that region is sh. Like this:

To formulate a proper cudaMemcpy2DToArray operation to this destination region in the cudaArray dst from a 2D unpitched host memory allocation src, it would look like this:

cudaMemcpy2DToArray(dst, x, y, src, sw*sizeof(src[0]) , sw*sizeof(src[0]), sh, cudaMemcpyHostToDevice);

Regarding “2D unpitched host memory allocation”, that is effectively a linear allocation of memory. 2D refers to the idea that it logically represents a 2D region, consisting of rows and columns. However it may ordinarily be a 1D allocation, where the “rows” are adjacent to each other. In addtion, the assumption for the above case is that the source allocation effectively has a width of sw, i.e. the entire source allocation corresponds to a data set the size and shape of the green region above.

For the general case of a pitched source (host) allocation, the only difference would be choice of src pointer to point to the beginning of the source region, and modification of the host pitch parameter.

From the above description, it is evident that a pitch is involved when using this api as a replacement for cudaMemcpyToArray (which is deprecated).

Going back to the original question then, when refactoring code that is currently in a realization like this:

cudaMemcpyToArray(dst, 0, 0, &vector[0], nSamples*sizeof(float), cudaMemcpyHostToDevice)

it will be necessary to know the width associated with the cudaArray dst. The source line pitch parameter (as well as transfer column width) associated with the cudaMemcpy2DToArray operation must be consistent with (i.e. less than or equal to) the width of the cudaArray (we are considering both widths in elements for this comparison statement, although the widths associated with the cudaMemcpy2DToArray operation are expressed in bytes).

If the width of the cudaArray is not known by inspection of the code, it can be ascertained in-situ at runtime using the cudaArrayGetInfo api:

Thanks for your input.

If I understood correctly, if the code did 1D Malloc like this before:

//An array of floats, named vector with size=nSamples
cudaChannelFormatDesc floatChannelDesc = cudaCreateChannelDesc<float>();
cudaArray *dst;
cudaMallocArray(&dst, &floatChannelDesc, nSamples))
cudaMemcpyToArray(dst, 0, 0, &vector[0], nSamples*sizeof(float), cudaMemcpyHostToDevice)

Then I can just replace the last line with

cudaMemcpy2DToArray(dst, 0, 0,  &vector[0], sizeof(vector[0]) * nSamples, sizeof(vector[0]) * nSamples, 1, cudaMemcpyHostToDevice);


1 Like


(I had previously posted that as a response in this thread, but I deleted it because it did not provide a general answer, and without knowing your actual cudaArray allocation, I could not be certain we were talking about that case. In your case, the width parameter is nSamples, and your cudaMempcy2DToArray pitch and width of transfer arguments are consistent with that width.)