Where are Cg's vector operations in CUDA are vector operations completely missing


As far as I understand, vector operations that are available in Cg were hardware supported, e.g. the fragment processor could do addition of two float4 variables in the same time it could do it for two float variables. In CUDA, such operations are not acceptable. Were vector operations removed from the new hardware architecture? or will they be available in upcoming CUDA releases? or the compiler takes care of vectorizing the code?

Thanks a lot in advance!


Yes the G80 hardware has changed. It is scalar now. But the number of ALUs per processor has doubled. You have now 8 instead of 4. Together with the automatic scheduling of threads that can execute together (a warp) you can view this as a vector processor that scales automatically from 1 to 8. Things to watch out for:

  1. there is no need anymore to pack things into a float4 which is very convenient.
  2. if you have many divergent threads, the warp might not be filled. In this case the divergent parts get serialized and you don’t get the full performance.
  3. the ALUs run at twice the processor speed and execute instruction in time slice mode, so the warp size is 32. This means on the downside that you basically need 32 values to be handled in parallel to keep the processor busy in every clock cycle where on previous hardware you needed 4.


Thank you so much for the clarification!


That said, if you are doing vector math in your CUDA code, you can write operator overloads to emulate the built in operators in Cg (except swizzle).