As far as I understand, vector operations that are available in Cg were hardware supported, e.g. the fragment processor could do addition of two float4 variables in the same time it could do it for two float variables. In CUDA, such operations are not acceptable. Were vector operations removed from the new hardware architecture? or will they be available in upcoming CUDA releases? or the compiler takes care of vectorizing the code?
Thanks a lot in advance!