Calling all Juffas! What's up with erfcf() nowadays?

njuffa · August 13, 2023, 7:14am

The latest version of an erfcf() implementation for CUDA that I posted in these forums about a year ago is here:

I have only the vaguest recollections of “fastererfc”. In terms of incorporating it into other code, the most straightforward approach would be for me to re-post it with a 2-clause BSD-license attached. That way you can cut & paste it into any code base without issues. The practical issue that prevents me from doing that immediately is that I have literally thousands of files sitting around on my PC without version control.

If you could briefly remind me of the specifications of “fastererfc” I should be able to find and review the relevant piece of code or worst case reconstruct it in short order:

(1) What was the input domain (I seem to recall [0,4] was common in your field)?
(2) What kind of error bound was specified (absolute or relative)?
(3) What was the numeric value of the error bound?

I retired from NVIDIA in 2014 and cannot give you an overview of changes in the CUDA math library since then. I am aware that for some of the math functions NVIDIA has taken advantage of code I posted in these forums (whether verbatim or in modified form I do not know) because the copyright notice from the accompanying 2-clause BSD license appears in NVIDIA’s documentation. I am also aware of improvements to single-precision division and expf() as part of ongoing maintenance. I discover such differences coincidentally when looking at SASS for my own work, but I am not systematically hunting for such changes or tracking them.

CUDA’s single precision math functions benefit in minor fashion from changes to the GPU architectures over the years. For example, recent GPUs allow FP32 operations to incorporate a full FP32 literal constant, whereas older GPUs supported only truncated FP32 constants. Also, the introduction of instructions like LOP3 and IADD3 provide minor performance improvements.

Topic		Replies	Views
Optimized version of single-precision error function, erff() CUDA Programming and Performance	21	4656	December 25, 2017
An accuracy-optimized performance-competitive implementation of erfcf() CUDA Programming and Performance	0	749	August 2, 2022
Passing textures to a __global__ function CUDA Programming and Performance	24	3447	June 2, 2017
An accuracy-optimized performance-competitive implementation of `erfcf` GPU-Accelerated Libraries	1	539	August 2, 2022
Accuracy-optimized performance-neutral implementation of erfcxf() CUDA Programming and Performance	2	471	September 4, 2022
Larger than expected / documented error in erfcinvf() CUDA Programming and Performance	7	606	February 13, 2020
Fastmath functions Speed or accuracy CUDA Programming and Performance	8	21637	April 16, 2009
Strange behavior of cosf function (possible bug ?) CUDA Programming and Performance	13	2230	March 6, 2013
Accuracy in GPU floating point calculations CUDA Programming and Performance	35	8436	September 9, 2011
Accuracy-optimized implementation of erfcinvf(), without performance impact CUDA Programming and Performance	0	270	December 2, 2023

Calling all Juffas! What's up with erfcf() nowadays?

Related topics