So I noticed that the C to PTX compiler doesn’t seem to do a good job of loading the parameters and generally seems to load them all up into registers at the beginning of the function, rather than when they are needed.
The only way I can see of avoiding this, and manually ensuring the ld.param is used only when it is actually needed, is to code the entire ptx function manually.
Is there any way to inject/inline an entire PTX function block? i.e. not just asm within a C++ device function. But an entire block of PTX. I know I can call it from my code using asm, but there doesn’t seem to be any easy way of doing this code injection without first breaking up the compilation and manually copying the new PTX code into the intermediate PTX file.
A pragma would be nice;
… pure ptx