Default inlining limitations

The manual says that a “device function is always inlined” and I take that to mean the function body will replace the function call and variables will be expanded. In my program the main section is a loop in which several functions are called. This configuration required 99 registers. Copying the body of these functions into their own block and replacing the original function calls (manual inlining) resulted in 26 fewer registers being used (still not enough to improve the situation but certain an improvement). The signatures of my functions contain “const type”, “const type &”, and “type &” where type is int, float, or one structure.

Is automatic inlining not the same as manual inlining?

Is there anything about how I am passing arguments that might cause additional registers to be consumed?


It’s possible that passing by pointer or by value rather than by reference may help. However, if you can post your code so that we can reproduce the problem here I would be happy to file a bug. Try to make the code as simple as possible while still reproducing the problem. Please also post the compilation command line.