i’m sorry if this has been asked before, but i couldn’t find any references…
From what i know, Shader Units that execute CUDA are Vectorial SIMD processors, that support swizzle operators and vector instruction at an hardware level…
for example in a shader was perfectly legal to write a thing like “position.xyz /= position.w”, or “uv.xy = uv.yx”;
scalar operations were in fact avoided because the GPU could use float4s natively, and this greatly boosted performance.
Now, in CUDA, everything looks like a scalar instruction, even float4 is made of 4 single floats… from what i know this is the worst thing one could do with a SIMD core, so, here’s finally the question :D
Is CUDA code eventually compiled to use vector instructions like in shaders, or what we see is what the GPU executes?
And if so, why this doesn’t kill performance?
Sorry if the question is really n00b :D