CUDA lacking HLSL intrinsics?

I’ve noticed that the CUDA language is missing a lot of the intrinsic instructions that exist in HLSL, in particular a lot of the ones for working with “float3” for example…

normalize
distance
length
dot

(full list here)
http://msdn.microsoft.com/en-us/library/bb509611(VS.85).aspx

Obviously I can write these functions myself, but I was curious if the HLSL intrinsics are more efficient…or if they actually map directly to the instruction set of the GPU, which is just not visible to CUDA

The “cutil_math.h” include file in the SDK has implementations of many of the HLSL intrinsics.

They don’t map directly to hardware instructions, so the CUDA versions are no less efficient.