CUDA lacking HLSL intrinsics?

I’ve noticed that the CUDA language is missing a lot of the intrinsic instructions that exist in HLSL, in particular a lot of the ones for working with “float3” for example…


(full list here)

Obviously I can write these functions myself, but I was curious if the HLSL intrinsics are more efficient…or if they actually map directly to the instruction set of the GPU, which is just not visible to CUDA

The “cutil_math.h” include file in the SDK has implementations of many of the HLSL intrinsics.

They don’t map directly to hardware instructions, so the CUDA versions are no less efficient.