Coda Generation

Hi. Sorry for my English, i from Ukraine. When I compile program with compute_20,sm_21 program work more slowly than when i compile with compute_10,sm_11.
I use GeForce GT 420M Compute Capabilities 2.1, and CUDA 4.0. Compile with VS2010.
For compute_20,sm_21:
“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe” -gencode=arch=compute_20,code="sm_21,compute_10" --use-local-env --cl-version 2010 -ccbin “D:\Program Files\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" --keep-dir “x64\Release” -maxrregcount=32 --machine 64 --compile -Xcompiler “/openmp” -Xcompiler "/EHsc /nologo /O2 /Zi /MTd " -o “x64\Release\main.cu.obj” “D:\BP\CPPproject\cuda_dct\cuda_dct\main.cu”

Where is problem?