Use of the vector types can improve the efficiency of memory access as fewer accesses are needed for the same amount of data handled. If your data readily lends itself to the use of a vector type, use the pre-defined vector type. For example complex data can trivial be represented as float2 or double2, likewise a double-double operand is best represented as a double2, and a custom 128-bit integer type would be served well by storing it in a uint4 (or maybe a ulonglong2).
Yes, __align() would be the way to do this. I do not know whether the compiler uses additional magic beyond alignment for the built-in vector types. You might want to check the generated machine code (SASS) with cuobjdump --dump-sass to make sure you are getting the code you want.
Is there anything lacking in CUDA’s pre-defined vector types that motiviated you to do your own vector types?
I just like Object Oriented programming style a lot…
Also, using my own vector class let me reuse tremendous amount of code with my other C++ projects.