I’ve noticed that the CUDA language is missing a lot of the intrinsic instructions that exist in HLSL, in particular a lot of the ones for working with “float3” for example…
normalize
distance
length
dot
(full list here)
[url=“Intrinsic Functions (DirectX HLSL) | Microsoft Docs”]Microsoft Docs - Developer tools, technical documentation and coding examples
Obviously I can write these functions myself, but I was curious if the HLSL intrinsics are more efficient…or if they actually map directly to the instruction set of the GPU, which is just not visible to CUDA