GTX480 faster if compiled with sm12 istead of sm20

my program run a little bit faster if i compile it with -arch sm12 (or sm13) instead of -arch sm20.

my program use a lot of cufft c2c and never use double precision variables.
is it possible that, when compiled in sm12, a faster and less accurate cufft is used ?

what about just in time recompilation ? if i compile my code with -arch sm12 and i run it on a compute capab 2.0 device, should not be recompiled in just-in-time as sm20 ?