Cuda 4.0 decreases speed?

I have Cuda 2.2 installed on another computer(Vista). If I compile on this other computer, the results are 10-20% faster than on my new Windows 7 Cuda 4.0 computer. I’m very sure I use the same options. Where could be the problem?

Are you running 64-bit Windows 7? If that is the case, at least one source of your problem could be that the Cuda 4.0 compiler is using 64-bit addressing, which will end up eating up more registers for memory addresses, while using a slightly higher number of dynamic instructions for address arithmetic.

I’ve been having problems with the 64/32-bit addressing slowing things down lately, so this could be more of a case of me projecting my issues here, but you should be able to check this out by determining if your 2.2 version is register bound. If that is the case, their decreased availability with the 4.0 version could quite possibly slow things down by 10%.

Another possibility is that changes to the nvcc compiler have caused your code to come together less efficiently than before, but 20% seems like a big jump.

kleboeuf, THANKS A LOT for your answer!(Sincerely didn’t expect one anymore) External Image

My old Vista-Computer is indeed 32-bit. While the new Windows 7 one is 64-bit.

But what do you mean by “you should be able to check this out by determining if your 2.2 version is register bound”?

I’m not used to working in a Windows environment myself, but you should be able to set some verbose options on the nvcc compiler to have it report your register usage. For example, in linux the command would be: nvcc --ptxas-options -v (…)

You can use the register usage as an input to the Cuda Occupancy Calculator along with your other parameters (number of blocks, threads, the GPU you are using, etc.), and it will help you find out if the number of available registers is what’s holding you back.

If that is indeed the case, I wouldn’t be surprised at all to find that you take a performance hit going from the 32-bit addressing to 64, as that will eat up some of your precious registers.

Again, I’m not sure how to go about doing this in your development environment, but you should be able to target a 32-bit platform in Windows7-x64, that way you can see for certain if that was affecting things: compile for 32-bit Windows, run your program, and see if that helps.

Good luck!