Hardware accelerated vector operations?

If I use lots of dot() functions in a CUDA kernel. However, I’ve seen that they are declared simply are simply expanded (so that the dot()function is just translated as: xx+yy+z*z.

I would like to know whether is there any function to perform native dot products on the GPU (without having to perform 3 muls and 2 sums), as the GPU is capable of it (thinking about shaders).

Greetings.

See the FAQ, Q32:
http://forums.nvidia.com/index.php?showtopic=84440

In short, no, current NVIDIA GPUs are scalar within each thread, although you can think of them as vector (SIMD) across the warp.