If a UINT pointer is implicitly used as a UINT4 pointer, but unaligned to 16-byte boundaries, how much performance penalty will it be on various NV architectures? Or we can’t really do this on certain architectures? Thanks.
You should not do that (casting a uint* to an uint4*). The compiler will assume that pointers of type uint4 are always aligned (it’s declared as builtin_align(16) in vector_types.h), and this can mean a lot of headaches for you trying to figure out why your results are wrong sometimes-but-not-always.
It is safe to do so using some my_uint4 structure that you’ve defined yourself however. The compiler will assume it is unaligned and use four separate memory accesses to load/store the structure, which breaks coalescing and hurts performance, but at least is correct.
Have a look at the alignedTypes SDK sample for details…