What's wrong with goniometric functions? they result in local memory usage?

This code:

__global__ void test(double gV[]){

			double phi,tmp;





gives me

1>ptxas info : Used 25 registers, 40+0 bytes lmem, 24+16 bytes smem

The same goes for cos() and tan().

Why is there the local memory usage? Is it going to result in slow performance? Is this something I should learn to live with??

I use CUDA 2.1 on WinXPx64

(BTW I compile with -maxrregcount 150)

Read appendix B of the programming guide to understand CUDA’s numeric function options and tradeoffs.

sin() uses local memory as part of a lookup table. This minimizes register use in general. Imagine if you have a tuned kernel that uses exactly N registers, then a single cos() call jumps your register count to N+5 or whatever and changes your whole allowable blocks per SM, etc. It could be a disaster.

But, but, but… you cry, it’s so slow!

But that’s why you should use _sinf() which is fast and uses few registers. The tradeoff is it’s only useful to an error of about 2^-21
and also expects arguments from -pi to pi.

An answer from the king of the math library…

“The argument reduction for trig function has a fast path and a slow path (for very large arguments). In practice it is very unlikely that the slow path will ever be exercised, but it needs to be there for correctness. To reduce register usage in the slow path, some local memory is used. Local memory is not used in the fast path. This applies to both single-precision and double-precision versions.”

Oh, that’s how it works. That’s positive.

Thanks a lot for the explanation.

Maybe it would be nice to include this information into the Programming Guide. Seeing that sudden local memory usage, and not knowing where it comes from (or what it is going to do) is a bit scary :)

See section We need to add mentions of DP in there, but we will soon.

Ah, I missed that one. I learned CUDA from Programming Guide 1.1, and did not reread this section of newer programming guides… Thanks again.