intrinsic functions __cos() and __sin() for double precision

I have a question whether the nvidia GPUs will support intrinsic functions __cos(), __sin() and __exp() for double precision.



What is the point of it? They are low precision. You may probably call double y=__sin((float)x);

I need to perform computations with double precision numbers. Therefore It would be great if I had the intrinsic functions with double precision input and output:

double __sin(double in)
double __cos(double in)
double __exp(double in)

I need to obtain accurate results.

I need to obtain accurate results.

If you need accurate results, just use normal sin function, intrinsic are faster but less accurate.

Thanks for your reply,
i was just curious whether these functions - sin() and cos() will be available as intrinsic functions with double precision accuracy in the future.

Intrinsic functions trade precision for speed. Speed - accurate __sin -> sinf -> sin. Wonder, what should be speed and precision for __sind?

I do not understand the second sentence in your previous answer. Did you order the functions from the fastest and the less accurate to the slowest and most accurate? Because they seem to me in this way : __sinf () … intrinsic function for floats
sinf () … software implemented function for float
sin () … software implemented function for double

__sind() would be really useful :-).

I guess that __sind() would be obviously slower then __sinf(), but faster than sin(). More accurate than __sinf(), but slightly less accurate than sin().

What is about sinf()?

In general, intrinsics in CUDA expose underlying hardware features. There is no special hardware support for double-precision sine and cosine.

If you are interested in boosting double-precision sine and cosine throughput, I would suggest looking at the functions sincos(), sinpi(), cospi(), and sincospi(), which may be applicable for your use case.

Your question implies that the throughput of double-precision sin(), cos(), and exp() are too low for your application. What is your use case, how much of a performance boost is needed, and what GPU platform are you currently using? Use of one of the K20 variants (with compute capability sm_35) can significantly boost the throughput of these functions compared to earlier GPUs, you might want to look into that.

Thats the answer I have been looking for. Thanks njuffa ans as well to Lev for effort to help me. I am writing kernel code for matrix multiplication, where each element of the matrices contain at least one cos() or sin() function.