What do you use the debug the CUDA Program

Just want to know what kinds of tools you are using to debug the CUDA Program.

I tried to use Nsight to debug my CUDA Program on a single GPU Platform but it failed. I got the reason that the GPU is only for display or only for GDB, that means I cannot debug the CUDA program with Nsight using a single GPU platform.

So, do you turn off the X server and use the command line to debug the CUDA Program or you debug the CUDA Program remotely?

cuda-memcheck still works, and I use lots of printf when I suspect problems in some parts of the code.