Has this instruction been completely removed on Turing hardware?
I can see under the CUDA documentation table “Throughput of Native Arithmetic Instructions” that it’s been listed as “multiple instructions” since 5.0 (I’m assuming the 64 under 7.x is a mistake), but on a 1080ti I can clearly see it getting executed in Nsight.
However, SM_75 on a 2080ti turns the intrinsic into multiple instructions.
Can someone from nvidia confirm that it’s gone?