Register usage Registers used for code that doesn't get run? (if's)

Hi. I’m writing a raytracer and have several shade functions depending on what object was hit and their material. Everytime I’ve added a shade function my register usage (as reported by the compiler) goes up. Does this mean even if all my object had only one material (and hence only one shade function was called) that all those registers will be used? And wil this adversely affect my performance, even though the functions and extra code may not get executed? So am I better off writing one small shade function to reduce register usage?

all functions inside a kernel are inlined, so the fact that its in different functions doesn’t matter at all. The register reusage optimization are a bit finicky in CUDA, so if a part of your kernel is and the complier can reuse some if the registers, it some times misses some of them. you can play around with your code to try and help the compiler with that.

good luck

Hmmm, is there anyway to tell the compiler not to inline functions? Because I have many for loops and they call other functions, ptxas is taking several minutes now to compile my code and it’s getting rather annoying. If I can’t non-inline I guess I may have to try restructuring my code :(. It’s all because of my iterative versions of reflectance and refraction :(

im pretty sure you can’t sorry :(

You can pass bool template parameters to the kernel and call needed functions only if the corresponding parameter. You end up needing a big case or if structure to call all the template variants, but each kernel will only be built with the code (and thus the register usage) that it really needs to run.

I have had trouble reducing the number of my required registers, can you mention some good practices for doing this or direct me somewhere that does?

The one trick I have used that has helped is using volatile variables where the compiler tries to make several copies of my variables, including constants which are used more then once.

Actually, at least according to the Programming Guide, it may be possible not to inline functions. The Guide says:

[i]4.2.5.1 noinline

By default, a device function is always inlined. The noinline

function qualifier however can be used as a hint for the compiler not to inline the

function if possible. The function body must still be in the same file where it is

called.

The compiler will not honor the noinline qualifier for functions with

pointer parameters and for functions with large parameter lists.[/i]

I never used it though…

thats cool, wasn’t there last time i read the manual (or at least i don’t remember) must be a resent change, i know that in ptx it was always possible, but not exposed since it is usually very inefficient.