The CUDA Programming Guide states that “Single-precision floating-point square root is implemented as a reciprocal square root followed by a reciprocal”.
I actually need reciprocal square root – is there a function that can be used to retrieve this result directly? (without doing another reciprocal)