CUDA kernel about 60 times slower when compiled with -G

I am running CUDA 5.5 and Nsight for Visual studio 3.2.2 and found I that my CUDA kernel runs in debug mode is about 60 times slower than in release mode.

The kernel runs on a GTX680.
The display for Visual Studion uses a GTS250.

What can cause this difference?

Are there known issues about this?


when you pass -G (for debug info), it also flips the optimization to 0.

A lot of stuff will end up sitting in local memory, instead of registers.