CUDA 3.0 register allocation trouble high and unpredictable register usage

UPDATE: I had a trouble with register allocation, but figured out was was causing the difference…

Trigonometric functions don’t seem to make nvcc reg. allocator very happy. Is there any way to reduce reg. usage when using cos(x)?

Have you tried [font=“Courier New”]#pragma unroll 1[/font]?

nope, I’ll give it a try. thanks!

would the less accurate __cosf() work for your use case? the difference to the full precision cos() is described in the programming guide. As far as I remember you mainly have to take care that the argument is close to the 0…2PI range to get a good precision on the result.

oh, that’s a good idea, I’ll give it a try tomorrow. I’m really curious now about the register usage difference…