Copying to 2D cuda array source array is smaller than allocated one

Hi all,

I have a problem with copying an array of floats from host memory to 2D array of float4 on device.

When this arrays have the same size everything works fine. For example: on the host i have 10000x10000 array of floats and on the device 2500x10000 array of float4 (2500*4 = 10000).

But when the host array doesn’t fit the size of the device array I get cudaErrorInvalidValue return from cudaMemcpy2DToArray(). For example, on the host I have 13333x6666 array of floats and on the device 3334x6666 array of float4 (3334*4=13336).

My code looks like this:

cudaMallocHost(ptr, _N * _M * sizeof(float));


cudaArray* dev_F;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float4>();

CUDA_FUNCTION_CALL(cudaMallocArray(&dev_F, &channelDesc, _N >> 2 + ((_N & 3) ? 1 : 0), _M));


CUDA_FUNCTION_CALL(cudaMemcpy2DToArray(dev_F, 0, 0, ptr, sizeof(float) * _N, sizeof(float) * _N, _M, cudaMemcpyHostToDevice));

So my questions are: How can I copy such an array to 2D cuda array? Or I can do this only for arrays of the same size?


Perhaps you messed up the alignment restrictions?

Thanks for your answer! But I don’t actually understand what do you mean under “messed up the alignment restrictions”. Could you please explain it?

It was just a quick (and perhaps silly) idea. CUDA aligns memory for quick access. So each line in 2D memory has some trailing bytes to fill the “gap” to the next alignment step; this results in the pitch.

But I’m not sure if this is the reason for your problem.

Ok, I’ll look through it. Maybe someone has another idea?

I found the solution. It was my stupid mistake, sorry, just forgot the brackets.

Instead of

CUDA_FUNCTION_CALL(cudaMallocArray(&dev_F, &channelDesc, _N >> 2 + ((_N & 3) ? 1 : 0), _M));

I should wrote

CUDA_FUNCTION_CALL(cudaMallocArray(&dev_F, &channelDesc, (_N >> 2) + ((_N & 3) ? 1 : 0), _M));