float4 memory necessary

Does anyone know , how much memory does the type float4 need in device memory? is it simply 4* 4 Byte?

16 bytes. You can see the declaration in vector_types.h:

struct __builtin_align__(16) float4

{

  float x, y, z, w;

  __cuda_assign_operators(float4)

};