Register usage differs between platforms


I just encountered an odd thing. When compiling my code on my Linux computer (running nvcc 2.1) I get a register usage of 25 registers for one of my kernels. I would like to keep the code platform independent and I therefore also would like to compile the same code on my Windows machine. However, when compiling the exact same code in Windows I get a register count of 26 instead, which is very annoying because it forces a blocksize change! I’ve tried nvcc version 1.1, 2.0, 2.1 and 2.2 and they all give the same register count. Have anyone else encountered this problem? Any solutions?

Using --maxrregcount is not an option as it forces local memory use and hence slows my kernel down to much.


Is there a difference in the intermediate PTX code between the two platforms?


Yes, there is a difference. Even in my minor kernels (where the register usage didn’t change).



An interesting thing in the .ptx files is that the pointer size is different for the two systems, which is logical given that Windows XP is 32-bit while my Linux distribution (Ubuntu 9) is 64-bit. Is there a way to set the pointer ptx-pointer size to see if this changes my outcome?


What I think priviously if we maximize register uses then it will provide less execution time.

But in my kernel register per thread is 60 and it has 0.25 occupancy( with 256 threads per block) but I make lass threads per block then register per thread becomes 124 but occupancy reduces to 0.125 and execution time larger.

What sholud I do to increase occupancy?

As suggested in many previous related topics… use the " volatile " command for register level variables…

You may have to keep an eye on the local memory (lmem)… some times it forces nvcc to put registers local memory.

But this volatile trick works , for more details search for it in the forum.