I just encountered an odd thing. When compiling my code on my Linux computer (running nvcc 2.1) I get a register usage of 25 registers for one of my kernels. I would like to keep the code platform independent and I therefore also would like to compile the same code on my Windows machine. However, when compiling the exact same code in Windows I get a register count of 26 instead, which is very annoying because it forces a blocksize change! I’ve tried nvcc version 1.1, 2.0, 2.1 and 2.2 and they all give the same register count. Have anyone else encountered this problem? Any solutions?
Using --maxrregcount is not an option as it forces local memory use and hence slows my kernel down to much.