Thread divergence in sin functions

dz1726001 · January 27, 2021, 11:47pm

Hi,
I read the source code of sin(x) function in some C/C++ compiler (for example, fdlibm, as is copied below), and it seems that the sin() function in C contains many divergences. I wonder if nvcc has optimized these divergences, or I need to optimize the function myself?

#include “fdlibm.h”

#ifdef STDC
double sin(double x)
#else
double sin(x)
double x;
#endif
{
double y[2],z=0.0;
int n, ix;

/* High word of x. */
ix = __HI(x);

/* |x| ~< pi/4 */
ix &= 0x7fffffff;
if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);

/* sin(Inf or NaN) is NaN */
else if (ix>=0x7ff00000) return x-x;

/* argument reduction needed */
else {
    n = __ieee754_rem_pio2(x,y);
    switch(n&3) {
	case 0: return  __kernel_sin(y[0],y[1],1);
	case 1: return  __kernel_cos(y[0],y[1]);
	case 2: return -__kernel_sin(y[0],y[1],1);
	default:
		return -__kernel_cos(y[0],y[1]);
    }
}

}

njuffa · January 28, 2021, 12:13am

Naturally, NVIDIA engineers are aware of the need to minimize instances of thread divergence in standard math functions.

You should not see significant divergence in the CUDA implementations of trig functions except for pathological cases using arguments extremely large in magnitude (very roughly, greater than 2¹⁵ for single-precision computation and greater than 2³¹ for double-precision computation).

You can always check into branch divergence with the CUDA profiler, and also examine the machine code produced by the compiler by running cuobjdump --dump-sass on the executable.

Topic		Replies	Views
divergent branches Not sure where they are occurring CUDA Programming and Performance	5	1280	February 10, 2011
[SOLVED] Njuffa's sincosf() vs __sinf() + __cosf() and current sincosf() CUDA Programming and Performance	5	2335	January 26, 2019
A faster and more accurate implementation of sincosf() CUDA Programming and Performance	25	9259	August 6, 2017
sin() doesn't work in emu (cuda 3.0 beta) CUDA Programming and Performance	2	1443	January 9, 2010
trigonometric functions standard c v/s cuda CUDA Programming and Performance	13	5728	October 25, 2015
CUDA Trigonometric Function Issues CUDA Programming and Performance	2	54	August 22, 2024
An implementation of single-precision tanpi() for CUDA CUDA Programming and Performance	1	783	July 26, 2017
__device__ and __host__ qualifiers in same function CUDA Programming and Performance	4	3256	February 20, 2012
native sincos() function? CUDA Programming and Performance	3	4891	March 9, 2007
Accuracy-optimized implementation of tanf(), without performance impact CUDA Programming and Performance	1	490	July 5, 2022

Thread divergence in sin functions

Related topics