Does anyone know , how much memory does the type float4 need in device memory? is it simply 4* 4 Byte?
16 bytes. You can see the declaration in vector_types.h:
struct __builtin_align__(16) float4
{
float x, y, z, w;
__cuda_assign_operators(float4)
};