Slow perfomance Runtime 4.1

I have a problem. I wrote CUDA program, MS VS 2008, CUDA Runtime 3.2. Dll wrote in C++, Dll functions called from C#. Program works correctly. Then I rewrote solution for VS 2010, CUDA Runtime 4.2, NVIDIA Nsight 2.1. I created new solution, then added new C# and C++ (Nsight) projects and paste in this projects my code from VS 2008 projects. Program works correctly but speed of CUDA functions became 3 x slower. Have anybody seen problem like this? Do you have ideas about compiler settings or something else??
GPU - Nvidia M540 (mobile)
CPU - Core I7 (2 GHz)
OS - Windows 7 (x64)

I am having a similar issue. I upgraded the CUDA runtime from 3.2 to 4.1 and some of my kernels are 20% to 40% slower.

For significant slowdowns in CUDA 4.1 due to code generation, such as those reported here, I would suggest filing bugs against the compiler. Please attach a self-contained repro case to the bug report. A link to a bug reporting form can be found on the start page of the registered developer website,

CUDA 4.1 contains significant changes to the compiler infrastructure. While much effort has gone into avoiding performance regressions with the revamped compiler, compilers are complex pieces of software containing numerous heuristics, and it is impossible to cover every possible permutation in testing. Thus regressions can occur but are expected to be fairly rare. Filing bugs for any functional issues or significant performance regressions will help with eliminating the remaining kinks. Thank you for your help.