I have a problem with copying an array of floats from host memory to 2D array of float4 on device.
When this arrays have the same size everything works fine. For example: on the host i have 10000x10000 array of floats and on the device 2500x10000 array of float4 (2500*4 = 10000).
But when the host array doesn’t fit the size of the device array I get cudaErrorInvalidValue return from cudaMemcpy2DToArray(). For example, on the host I have 13333x6666 array of floats and on the device 3334x6666 array of float4 (3334*4=13336).
It was just a quick (and perhaps silly) idea. CUDA aligns memory for quick access. So each line in 2D memory has some trailing bytes to fill the “gap” to the next alignment step; this results in the pitch.
But I’m not sure if this is the reason for your problem.