How is memory allocated by CUDA on the host and device for the special CUDA datatypes like float3?
For example if I declare an array x as follows
how are the components allocated in memory? Is it
or is it
where the first index is the index in the array and second index is the index within the float3 datatype.