Spitch parameter of cudaMemcpy2DToArray behaves different from definition in the documentation

In the documentation of cudaMemcpy2DToArray, it says
“spitch is the width in memory in bytes”

But the program will run into an error in the following code:

float m[3 * 3 * 2]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
cudaArray* cuArray;
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();
cudaMallocArray(&cuArray, &channelDesc, 3, 3);
cudaError_t err = cudaMemcpy2DToArray(cuArray, 0, 0, m, 3 * sizeof(float2), 3 * sizeof(float2), 3 * sizeof(float2), cudaMemcpyHostToDevice);
//err = 1 => invalid argument

If I change the code to be:

float m[332]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17};
cudaArray* cuArray;
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();
cudaMallocArray(&cuArray, &channelDesc, 3, 3);
cudaError_t err = cudaMemcpy2DToArray(cuArray, 0, 0, m, 3 * sizeof(float2), 3 * sizeof(float2), 3, cudaMemcpyHostToDevice);
//err = 0 => no error
it works fine.

Seems the unit of spitch is not byte but sizeof(element of dst)?