native sincos() function?

I understand the GeForce cards natively support sincos() function, which simultaneously computes sine and cosine.

Is this accessible through CUDA, or any plans to make it so?

Different NVIDIA GPUs have different levels of native support for sincos. However, even on G80 it’s possibly to efficiently compute both. We might add sincos() to the CUDA stdlib for the next release.


(1) G80 hardware does not have a sincos instruction.

(2) For optimal performance, use call __sinf(), __cosf() routines,

 which map directly to hardware approximations. The compiler 

 will remove all redundant code from the two calls as long as 

 the same register-mapped variable is passed to both:

device void BoxMuller(float &u1, float &u2){

 float   r = sqrtf(-2.0f * logf(u1)); 

 float phi = 2 * PI * u2; 

 u1 = r * __cosf(phi); 

 u2 = r * __sinf(phi); 


Resulting code is optimal for G80.

(3) For numerically accurate performance, call sinf() and cosf().

 The compiler will eliminate some redundant code between 

 the two calls. A sincosf() function will be added to CUDA 1.0 

 for improved efficiency. 

Thanks Mark and mfatica – you read my mind! – it was Box-Muller I was implementing.

The performance is very good – as you note, much better using the __cosf() and __sinf() functions, and particularly __logf(). I don’t see bad accuracy in the [0,2*pi] range used by Box-Muller. But bad is relative… :devil: