Thread divergence in sin functions

I read the source code of sin(x) function in some C/C++ compiler (for example, fdlibm, as is copied below), and it seems that the sin() function in C contains many divergences. I wonder if nvcc has optimized these divergences, or I need to optimize the function myself?

#include “fdlibm.h”

#ifdef STDC
double sin(double x)
double sin(x)
double x;
double y[2],z=0.0;
int n, ix;

/* High word of x. */
ix = __HI(x);

/* |x| ~< pi/4 */
ix &= 0x7fffffff;
if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);

/* sin(Inf or NaN) is NaN */
else if (ix>=0x7ff00000) return x-x;

/* argument reduction needed */
else {
    n = __ieee754_rem_pio2(x,y);
    switch(n&3) {
	case 0: return  __kernel_sin(y[0],y[1],1);
	case 1: return  __kernel_cos(y[0],y[1]);
	case 2: return -__kernel_sin(y[0],y[1],1);
		return -__kernel_cos(y[0],y[1]);


Naturally, NVIDIA engineers are aware of the need to minimize instances of thread divergence in standard math functions.

You should not see significant divergence in the CUDA implementations of trig functions except for pathological cases using arguments extremely large in magnitude (very roughly, greater than 215 for single-precision computation and greater than 231 for double-precision computation).

You can always check into branch divergence with the CUDA profiler, and also examine the machine code produced by the compiler by running cuobjdump --dump-sass on the executable.