If I use lots of dot() functions in a CUDA kernel. However, I’ve seen that they are declared simply are simply expanded (so that the dot()function is just translated as: xx+yy+z*z.
I would like to know whether is there any function to perform native dot products on the GPU (without having to perform 3 muls and 2 sums), as the GPU is capable of it (thinking about shaders).
Greetings.