I have two setups for developing CUDA applications, at home and the office. At the office, I am using Windows XP with CUDA 3.0 SDK, and at home Windows 7 with CUDA 3.2 (each has a GTX260). I move the .cu file back and forth as I need to. I had a nice kernel going, and got it to use texture memory when I got home. Once I did that, I noticed that the time in the kernel spiked from ~130 ms to ~2000 ms. I figured something went wrong, so I started over from the previous version the next day at the office. This time I got an improvement from ~130 ms to ~99 ms. I used the same file at home, compiled it, and again saw the time skyrocket to ~2000 ms. I switched nvcc to tell me the register and memory usage, and saw that at the office, everything was normal. However, at home there were “80+0 bytes local mem” being used by the kernel (the office computer’s nvcc wasn’t mentioning local memory at all, so it wasn’t using it). My question is what could be causing it? The .cu files are identical down to the byte, so I know it’s not anything like that. Optimizations are disabled across the board, max registers is set to 32, but I’m only using less than 14, and both are using the same build configurations, as near I can tell. I’m sure there are differences between nvcc 3.0 and nvcc 3.2, but an earlier one that causes local memory to be used and a later one that does? Is there anything I can do to fix this?
Edit: Sigh, it turns out the build configurations weren’t the same. I changed a configuration from “debug” to a different one I had made without debug info and with optimizations enabled, and it doesn’t use any local memory. It appears to have been a setting in the computer at home I messed up setting. I still don’t know which one, but since changing to a different configuration fixed the problem, it appears to have been a rouge setting.
Thanks for your time.