Float3 Memory Allocation in CUDA

How is memory allocated by CUDA on the host and device for the special CUDA datatypes like float3?

For example if I declare an array x as follows

float3 x[2];

how are the components allocated in memory? Is it

x11,x12,x13,x21,x22,x23

or is it

x11,x21,x12,x22,x13,x23

where the first index is the index in the array and second index is the index within the float3 datatype.

Just like it is in normal C. Otherwise memcopies will not work.