Improving the accuracy of the __sincosf-function

tomaso · August 16, 2009, 12:22am

For large arguments, the error of the __sincosf-function becomes
significant (the absolute error is about 4e-4 for arguments of the
order of 1e4).

The error seem to be linear in the argument, and be a 90-degree
phase-shifted wave.

Replacing the code:

   __sincosf(phi,&s,&c);

with:

  const float correction = 4.065371168387674e-08f;
  __sincosf(phi,&s1,&c1);
  t = t*correction;
  s = s1 + t*c1;
  c = c1 - t*s1;

improves the maximum absolute error in he range [-1e4,1e4] bu a
factor of about 100. The cost 3 is instrucitons (one mul and two
fma’s). The
maximum absolute error in the [-1e4,1e4] range for the improved code
is about 3.5e-6. The absolute error for small ranges is essentially
unaffected, since the correction is linear.

The resulting error of the improved code does not seem to grow with
the argument (phi), as is the case for the uncorrected __sincosf.

This has been tested on the G80 chip (a 8800GT, a 8800GTS and a
8800GTX with identical results).

Sylvain_Collange · August 16, 2009, 12:53pm

What does the variable t mean in your code?

For one more operation, you could perform a Cody-Waite range reduction. Something like:

float x = phi;

float const C1 = 6.28125f;

float const C2 = 1.93530718e-3f;

float k = rintf(x * float(2*M_1_PI));

x_r = (x - k * C1) - k * C2;

__sincosf(x_r, &s1, &c1);

(or 5 operations, as rintf may need 2 instructions)

It should be much more accurate for arguments up to around 25000.

By the way, why can’t you use the sincosf() function?

tomaso · August 18, 2009, 9:22am

What does the variable t mean in your code?

Sorry. the second t is supposed to be phi, e.g.:

[indent]

     __sincosf(phi,&c1,&s1);

 t = correction*phi;

     s = s1 + t*c1;

     c = c1 - t*s1;

[/indent]

By the way, why can’t you use the sincosf() function?

I first had one sin() and one cos(). Switching to __sincosf() saved 10 registers and

the entire code (including other calculations) was 6 times faster. My idea was to use

the hardware (but low accuarcy) functions. Since the error for large arguments was

undocumented I mapped it out. The realization that the error signal was just a phase

shifted wave of linear amplitude lead to this trick. The improvement above turned out

to be good enough, and very cheap.

For one more operation, you could perform a Cody-Waite range reduction. […]

I appreciate you tip. That is a good option, I will keep it in mind for later.

Sylvain_Collange · August 18, 2009, 11:45am

Interesting.

Indeed, the hardware sin/cos operator is not so bad, except for small and large arguments.

If phi gets very small in magnitude (say, less than 10^-5), then __sin(phi) will return 0. In this case, just returning phi is a very accurate alternative.
(testing the absolute value and comparing it to a constant is done in one instruction, so it will add two extra instructions at most).
But it may not matter in your case if you are just interested in minimizing the absolute error, not the relative error.

One more question: if CUDA provided a fast and accurate __sincospi(x) function that computes sin(pi * x) and cos(pi * x), would you be able to rewrite your code to take advantage of it?
(that is, does phi really needs to be in radians?)

Topic		Replies	Views
How to use the math function sincos(x) CUDA Programming and Performance	1	1179	July 16, 2011
sincospif() implementation with improved performance and accuracy CUDA Programming and Performance	8	3525	August 16, 2016
__sinf and __cosf errors Minimizing errors CUDA Programming and Performance	5	8502	November 25, 2010
sincos of large numbers CUDA Programming and Performance	5	1260	January 3, 2018
A faster and more accurate implementation of sincosf() CUDA Programming and Performance	25	9726	August 6, 2017
__device__ and __host__ qualifiers in same function CUDA Programming and Performance	4	3310	February 20, 2012
[SOLVED] Njuffa's sincosf() vs __sinf() + __cosf() and current sincosf() CUDA Programming and Performance	5	2477	January 26, 2019
A more accurate (and potentially faster) double-precision sincospi() implementation CUDA Programming and Performance	0	1741	August 12, 2015
Accuracy-optimized implementation of tanf(), without performance impact CUDA Programming and Performance	1	547	July 5, 2022
h2sin performance CUDA Programming and Performance	24	2097	July 16, 2019

Improving the accuracy of the __sincosf-function

Related topics