What's wrong with goniometric functions? they result in local memory usage?

Radko · March 10, 2009, 8:03am

This code:

__global__ void test(double gV[]){

			double phi,tmp;

			tmp=sin(phi);

			gV[0]=tmp;

	

}

gives me

1>ptxas info : Used 25 registers, 40+0 bytes lmem, 24+16 bytes smem

The same goes for cos() and tan().

Why is there the local memory usage? Is it going to result in slow performance? Is this something I should learn to live with??

I use CUDA 2.1 on WinXPx64

(BTW I compile with -maxrregcount 150)

SPWorley · March 10, 2009, 8:52am

Read appendix B of the programming guide to understand CUDA’s numeric function options and tradeoffs.

sin() uses local memory as part of a lookup table. This minimizes register use in general. Imagine if you have a tuned kernel that uses exactly N registers, then a single cos() call jumps your register count to N+5 or whatever and changes your whole allowable blocks per SM, etc. It could be a disaster.

But, but, but… you cry, it’s so slow!

But that’s why you should use _sinf() which is fast and uses few registers. The tradeoff is it’s only useful to an error of about 2^-21
and also expects arguments from -pi to pi.

tmurray · March 10, 2009, 5:44pm

An answer from the king of the math library…

“The argument reduction for trig function has a fast path and a slow path (for very large arguments). In practice it is very unlikely that the slow path will ever be exercised, but it needs to be there for correctness. To reduce register usage in the slow path, some local memory is used. Local memory is not used in the fast path. This applies to both single-precision and double-precision versions.”

Radko · March 10, 2009, 6:08pm

Oh, that’s how it works. That’s positive.

Thanks a lot for the explanation.

Maybe it would be nice to include this information into the Programming Guide. Seeing that sudden local memory usage, and not knowing where it comes from (or what it is going to do) is a bit scary :)

tmurray · March 10, 2009, 6:51pm

See section 5.1.1.1. We need to add mentions of DP in there, but we will soon.

Radko · March 10, 2009, 7:10pm

Ah, I missed that one. I learned CUDA from Programming Guide 1.1, and did not reread this section of newer programming guides… Thanks again.

Topic		Replies	Views
Local memory usage with double-precision trig Slow path reduction vs fast CUDA Programming and Performance	1	946	March 31, 2012
The number of registers used in a kernel and the performance are related to the way the local variables are processed？ CUDA Programming and Performance	4	548	March 1, 2020
Increasing Register Usage Haven't seen a good discussion on this since CUDA 1.0... CUDA Programming and Performance	12	9041	August 20, 2009
Using -maxrregcount does not increases local memory usage? CUDA Programming and Performance	3	6000	September 27, 2011
Lowering register usage CUDA Programming and Performance	14	4528	October 10, 2008
Forcing the compiler to place variables in registers and not in local memory CUDA Programming and Performance	6	2487	April 3, 2019
questions on register, local memory and block CUDA Programming and Performance	5	4887	February 28, 2008
Maximum optimization settings CUDA Programming and Performance	7	6921	June 21, 2008
Understanding different register counts for the same kernel CUDA Programming and Performance	3	911	December 13, 2019
When -maxrregcount option is used, kernel fail to run CUDA Programming and Performance	8	14538	February 10, 2011

What's wrong with goniometric functions? they result in local memory usage?

Related topics