CUDA 2.1 vs 2.0 performance Empty kernel call timing


I run CUDA 2.0 on C1060/Windows XP, and 2.1 on C1060/Windows Server. When I time empty kernel calls in a loop, 2.0 is more than 10 times slower than 2.1. Is this expected or does it have to do with my setup?


wait, 2.1 is faster and you’re worried? also that was a long time ago, it’s certainly possible that we sped up kernel launches between those two versions.

Thanks for your quick response. No, I’m not worried. I just want to understand if what I see is because of CUDA version or something else that I’m missing. I looked at the differences between the two versions and didn’t see anything about kernel call performance. Is there performance improvement in 2.2?

Thanks again…