Discrepancy in register count b/w Windows and Linu

I have bit-by-bit identical kernel, and it is compiled into 16 registers in Windows, and 17 registers in Linux…

Any idea how I should proceed to file a bug?

You can attach the code here, along with build/reproduction instructions, or send me a PM if you’d prefer not to post the code publicly.

Is your linux install 64-bit by chance? Pointers on the device are 8 bytes when compiled in the 64-bit toolchain. This can cause different numbers of registers under linux and windows.

Oh, precisely! I do have 64 bit kernel. But I’m not using 64 bit functionality at all, so is there any way like -m32 to compile it for 32 bit?

There is an m32 option for nvcc, but it is suppose to only be for compiling cubins for use with the Driver API. I’ve never used it.

When using the Runtime API, you need data structures to be the same bit for bit both on the device and host so that they can be memcpyd back and forth. If you have pointers that are different sizes, bad things will happen. I’ve never noticed a performance difference between windows and 64-bit linux in my kernels.