how does single GPU debugging happen on kepler?

I was wondering how single GPU debugging happen in Kepler environments with CUDA 5?

With Cuda 4.x under Fermi two cards were required as the debugging did real break points and stopped the GPU dead so it couldn’t be used for graphics.

The AMD GDebugger took a different route, saving the entire kernel state to CPU memory and exiting the kernel, so it was running custom kernels for debug.

Kepler supposedly should be able to do multi processing on the GPU (not fully up to date as to what level, i.e can it do actual GPU partitioning or not). I was wondering if it is taking that approach (i.e, only running part of the GPU, and doing real debug breakpoints) or if it takes the GDebugger approach of building custom kernels.

Does anyone have any idea?
I have one computer running a gtx690 and one running a gtx640 if it makes a difference.

sorry if the immediate somewhere and I missed it, I’ve been away from hard core GPU computing for a awhile.

Thanks