I just encountered an odd thing. When compiling my code on my Linux computer (running nvcc 2.1) I get a register usage of 25 registers for one of my kernels. I would like to keep the code platform independent and I therefore also would like to compile the same code on my Windows machine. However, when compiling the exact same code in Windows I get a register count of 26 instead, which is very annoying because it forces a blocksize change! I’ve tried nvcc version 1.1, 2.0, 2.1 and 2.2 and they all give the same register count. Have anyone else encountered this problem? Any solutions?
Using --maxrregcount is not an option as it forces local memory use and hence slows my kernel down to much.
An interesting thing in the .ptx files is that the pointer size is different for the two systems, which is logical given that Windows XP is 32-bit while my Linux distribution (Ubuntu 9) is 64-bit. Is there a way to set the pointer ptx-pointer size to see if this changes my outcome?
What I think priviously if we maximize register uses then it will provide less execution time.
But in my kernel register per thread is 60 and it has 0.25 occupancy( with 256 threads per block) but I make lass threads per block then register per thread becomes 124 but occupancy reduces to 0.125 and execution time larger.