Calling all Juffas! What's up with erfcf() nowadays?

The latest version of an erfcf() implementation for CUDA that I posted in these forums about a year ago is here:

I have only the vaguest recollections of “fastererfc”. In terms of incorporating it into other code, the most straightforward approach would be for me to re-post it with a 2-clause BSD-license attached. That way you can cut & paste it into any code base without issues. The practical issue that prevents me from doing that immediately is that I have literally thousands of files sitting around on my PC without version control.

If you could briefly remind me of the specifications of “fastererfc” I should be able to find and review the relevant piece of code or worst case reconstruct it in short order:

(1) What was the input domain (I seem to recall [0,4] was common in your field)?
(2) What kind of error bound was specified (absolute or relative)?
(3) What was the numeric value of the error bound?

I retired from NVIDIA in 2014 and cannot give you an overview of changes in the CUDA math library since then. I am aware that for some of the math functions NVIDIA has taken advantage of code I posted in these forums (whether verbatim or in modified form I do not know) because the copyright notice from the accompanying 2-clause BSD license appears in NVIDIA’s documentation. I am also aware of improvements to single-precision division and expf() as part of ongoing maintenance. I discover such differences coincidentally when looking at SASS for my own work, but I am not systematically hunting for such changes or tracking them.

CUDA’s single precision math functions benefit in minor fashion from changes to the GPU architectures over the years. For example, recent GPUs allow FP32 operations to incorporate a full FP32 literal constant, whereas older GPUs supported only truncated FP32 constants. Also, the introduction of instructions like LOP3 and IADD3 provide minor performance improvements.

1 Like