Does anybody have an information about internal implementation of various Shader Model 5.0 assembly instructions like log/exp etc. on current GeForce graphics hardware? Are there any means to estimate relative computational complexity of such instructions?
The single-precision hardware implementations of reciprocal, reciprocal square root, exp2, log2, sin, cos in current NVIDIA GPUs are based on quadratic interpolation. See:
S.F. Oberman and M.Y. Siu, “A High-Performance Area-Efficient Multifunction Interpolator,” Proc. 17th IEEE Symp. Computer Arithmetic (ARITH-17), 2005, pp. 272-279.
Both the actual paper and the slide deck from the conference are available via Google.
If by “relative computational complexity” you refer to the practice of establishing “equivalent FLOPs” my recommendation would be not to use such metrics, as they can be quite misleading. These functions can be implemented in a large variety of ways, each requiring a different number of floating-point operations, or even no floating-point operations at all. In some application areas, people tend to use canonical FLOPs equivalents for certain math functions that may have been accurate for some long extinct implementation, but have no relevance today.
Thank you very much for posting reference to paper and detailed answer about “computational complexity”.