code generation sm_10

Hi All,

I’m using visual studio 2012 and I was wondering what would happen if I compile code using sm_10 for a graphics card with for example compute capability 2.0. Would the wrong compile settings make any execution time penalty ( not being as optimized as it could be if the compiler option was set to sm_20) ?

And

how would it work with multiple settings when launching the code on a remote machine?

First of all if there some features which are only supported on higher cc your code will not compile or will demote the double precision to single. This could affect the correctiveness of your results. Also you can get problems with the number of threads per block and the number of blocks per grid. In some cases the code will compile and run, but the results will be wrong.
Regarding the optimization themselves. It depends of your code. For some architectures the compiler might want for example to use more registers while for other less. I experienced improvement with using 2.0 on 3.5 devices, because it increased the occupancy. In the 2.0 arch you can have up to 8 blocks per MP, this might be different and it might affect your performance.

So there is no answer. It depends on your code.

I typically compile most my code for sm_10 unless I specifically require later compute features. I have never seen much benefit from compiling for later architectures in my code.

Ok. Thanks for both your answers pasoleatis and cbuchner1!