If your card is of computability more than 2.0 you can use printf.
You have to define sm >=20 when you compile your code to be able to use printf()
After what you can use printf like in a cpu program.
just put printf in the code where you need to check and then when you compile add the flag -arch=sm_20 . If you put printf in a device or global function each thread will execute it, so you will get lots of prints.
you can check the the nvidia webpage (CUDA GPUs - Compute Capability | NVIDIA Developer) or you can run the devicequery example from the SDK examples and check the compute capability of your device.
pasoleatis explained better than me, use a flag for compilation.
I was not really clear it is true, I will try to explain better next time.
If you know the name of your graphic card, have a look in google to know the compute capability. I don’t know a direct link with all the compute capability for all graphics cards.
Edit: pasoleatis definitively gave you the right answer.