I see that there are many vector data types declared in CUDA. Like “float4”, “int4” etc… But I dont see (pardon my oversight if any) any vector operations that can be applied on them.
For example I tried to do this but could not get it compiled for Device :
shared extern float4 prices;
prices[i] = prices[i]*2
The compiler said “float4*int” is NOt possible. So, I tried this:
prices[i] = prices[i]*(2,2,2,2)
but this one too did NOT work.
Does PTX support Vector instructions? How can I take advantage of this one?
The only advantage I see now is that I can access more global memory per warp resulting in amazing speedups (I got 20X as pointed out by Mark in some earlier post)
Any inputs? Thanks.