Larger than expected / documented error in erfcinvf()

njuffa · February 1, 2020, 7:09pm

For both CUDA 8.0 and CUDA 10.2, section E.1. of the CUDA Programing Guide states the maximum error of erfcinvf() (vs. the mathematical result rounded to single precision) as:

However, in following up on some observations, I found that the maximum error of this function in CUDA 8.0 seems to be 4 ulp, not 2 ulp. Below I am showing a minimal standalone reproducer that shows the worst case I found; this is not the only case of larger than expected error.

I don’t have CUDA 10.2 installed on my machine and would be much obliged if someone could run the reproducer with CUDA 10.2 to either confirm that the issue still exists currently, or that it has been fixed in the meantime.

#include <stdio.h>
#include <stdint.h>
#include <string.h>

__global__ void kernel (float a)
{
    float res = erfcinvf (a);
    double ref = erfcinv ((double)a);
    float reff = (float)ref;
    printf ("arg= %23.16e %15.6a (%08x) \n"
            "res= %23.16e %15.6a (%08x) <<<<\n"
            "ref= %23.16e %22.14a       \n"
            "reff=%23.16e %15.6a (%08x) <<<<\n",
            a, a, __float_as_int (a), 
            res, res, __float_as_int (res), 
            ref, ref, 
            reff, ref, __float_as_int (reff));
}

int main (void)
{
    int iarg = 0x3757c618;
    float arg;
    memcpy (&arg, &iarg, sizeof arg);
    kernel<<<1,1>>>(arg);
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

I built the above code with

nvcc -o max_erfcinvf_error.exe -arch=sm_61 max_erfcinvf_error.cu

The output of the program on my machine (CUDA 8, Quadro P2000) looks as follows:

arg=  1.2861120922025293e-05  0x1.af8c30p-17 (3757c618)
res=  3.0847203731536865e+00   0x1.8ad81ep+1 (40456c0f) <<<<
ref=  3.0847213161387415e+00  0x1.8ad825e90b8430p+1
reff= 3.0847213268280029e+00   0x1.8ad826p+1 (40456c13) <<<<

Note the final number in each of the marked lines. These are the single-precision results of erfcinvf() and the reference result correctly rounded to single precision, respectively. The difference (0x40456c13 - 0x40456c0f) is 4 ulps (the actual difference of erfcinvf() vs. the mathematical result here is 3.95517 ulp).

striker159 · February 1, 2020, 7:37pm

With CUDA 10.2 Linux on a Titan Xp I get the same results.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

./erfcinvf
arg= 1.2861120922025293e-05 0x1.af8c30p-17 (3757c618)
res= 3.0847203731536865e+00 0x1.8ad81ep+1 (40456c0f) <<<<
ref= 3.0847213161387415e+00 0x1.8ad825e90b8430p+1
reff= 3.0847213268280029e+00 0x1.8ad826p+1 (40456c13) <<<<

njuffa · February 1, 2020, 7:40pm

Thank you for running the reproducer and confirming the issue (presumably a documentation issue) still exists. I guess that means I should file a bug now …

njuffa · February 1, 2020, 10:26pm

NVIDIA’s bug database keeps rejecting my bug report submission, even after I reduced it to a minimal stub: “An error occurred while processing your request.” Giving up.

Robert_Crovella · February 1, 2020, 10:32pm

There is maintenance going on for the bugs database right now. It was originally supposed to be done by now, and it was not supposed to create any outages. Unfortunately something has gone wrong and both of those statements are no longer correct.

At the moment the statement from IT is as follows:

“Extending the outage till 02 Feb, 4:00 AM PST due to NVBugs …”

Apologies for the hassle. I’m also unable to access nvbugs internally at the moment.

You may wish to try again tomorrow, or if I see no activity here I will file a bug to cover this issue as time permits after the system becomes available.

Thanks for reporting.

njuffa · February 2, 2020, 2:28am

Thanks for the bug database status update. I will try again later.

In the meantime, I have looked more closely at the polynomial that performs the computation giving rise to the observed worst-case error, and I am quite sure that this is just a documentation issue, possibly a cut & paste error from the neighboring entry for erfinvf()).

The polynomial utilized by CUDA for erfinvf() uses the following coefficients (extracted from the binary):

coeff[0] = af8a6370 -2.51727084e-10 -0x1.14c6e0p-32
coeff[1] = 3221f645  9.42742862e-09  0x1.43ec8ap-27
coeff[2] = b4016fda -1.20547526e-07 -0x1.02dfb4p-23
coeff[3] = 3468f846  2.16970051e-07  0x1.d1f08cp-23
coeff[4] = 370742aa  8.06214848e-06  0x1.0e8554p-17
coeff[5] = b804db4d -3.16754922e-05 -0x1.09b69ap-15
coeff[6] = ba4afea1 -7.74363114e-04 -0x1.95fd42p-11
coeff[7] = 3bb5c027  5.54658799e-03  0x1.6b804ep-8
coeff[8] = 3e24ae0f  1.60820231e-01  0x1.495c1ep-3
coeff[9] = 3f62dfc4  8.86226892e-01  0x1.c5bf88p-1

If I replace those with coefficients computed by myself a while ago (https://devtalk.nvidia.com/default/topic/1036239/a-very-compact-implementation-of-the-inverse-error-function-erfinvf-/), the reproducer posted above still shows the same amount of error (likely triggered by the use of __log2f() in the argument transformation in a classical trade-off of accuracy vs performance). The alternative set of coefficients I tried:

-2.00998329e-10 -0x1.ba0000p-33
 7.63788321e-09  0x1.066f88p-27
-9.44120231e-08 -0x1.957f1ep-24
 1.27279494e-08  0x1.b5543ap-27
 8.98369126e-06  0x1.2d7152p-17
-3.40910883e-05 -0x1.1dfa0ep-15
-7.70850747e-04 -0x1.9425d6p-11
 5.54407062e-03  0x1.6b5612p-8
 1.60820886e-01  0x1.495c76p-3
 8.86226892e-01  0x1.c5bf88p-1

njuffa · February 5, 2020, 10:57pm

Bug 2836258 filed. In post #6 I was actually looking at the wrong one of two polynomials used in the computation of erfcinvf(). The correct set responsible is:

coeff2[0] = c27c73f1 -6.31132240e+01  -0x1.f8e7e2p+5
coeff2[1] = 42fef829  1.27484688e+02   0x1.fdf052p+6
coeff2[2] = c2e4361c -1.14105682e+02  -0x1.c86c38p+6
coeff2[3] = 42714d9b  6.03257866e+01   0x1.e29b36p+5
coeff2[4] = c1ae51b3 -2.17898922e+01  -0x1.5ca366p+4
coeff2[5] = 40cef504  6.46740913e+00   0x1.9dea08p+2
coeff2[6] = bfea9e05 -1.83294737e+00  -0x1.d53c0ap+0
coeff2[7] = bcf871f4 -3.03277746e-02  -0x1.f0e3e8p-6
coeff2[8] = 3f553775  8.32877457e-01   0x1.aa6eeap-1

I was able to tweak this for a maximum error of 3.809635 ulps, but that still means the documented bound would be stated as 4 ulps, i.e. yet more confirmation that this is a documentation bug.

-6.31093750e+1 -0x1.f8e000p+5
 1.27482948e+2  0x1.fdee8ap+6
-1.14106033e+2 -0x1.c86c94p+6
 6.03259277e+1  0x1.e29b80p+5
-2.17898617e+1 -0x1.5ca346p+4
 6.46740246e+0  0x1.9de9ecp+2
-1.83294737e+0 -0x1.d53c0ap+0
-3.03277746e-2 -0x1.f0e3e8p-6
 8.32877457e-1  0x1.aa6eeap-1

Robert_Crovella · February 13, 2020, 9:03pm

Thanks for reporting. The issue has been confirmed internally and should be fixed in a future documentation update.

Topic		Replies	Views
Accuracy-optimized implementation of erfcinvf(), without performance impact CUDA Programming and Performance	0	289	December 2, 2023
An accuracy-optimized performance-competitive implementation of erfcf() CUDA Programming and Performance	0	761	August 2, 2022
Calling all Juffas! What's up with erfcf() nowadays? CUDA Programming and Performance	5	1008	August 28, 2023
A very compact implementation of the inverse error function erfinvf() CUDA Programming and Performance	0	1186	June 4, 2018
An accuracy-optimized performance-competitive implementation of `erfcf` GPU-Accelerated Libraries	1	570	August 2, 2022
Optimized version of single-precision error function, erff() CUDA Programming and Performance	21	4935	December 25, 2017
Accuracy-optimized performance-neutral implementation of erfcxf() CUDA Programming and Performance	2	513	September 4, 2022
Accuracy-optimized implementation of erff(), without performance impact CUDA Programming and Performance	0	720	June 19, 2022
A very fast low-precision implementation of erff() for compute capability >= 7.5 CUDA Programming and Performance	0	390	December 9, 2023
Precision issue about Single-Precision Floating-Point Function's table in programming guide CUDA Programming and Performance	2	661	July 14, 2023

Larger than expected / documented error in erfcinvf()

Related topics