I have been in trouble recently because I cannot run printf routine inside a kernel. I have read that it is possible for CUDA compute capabilities higher than 2.0. I run my code in a GeForce GT 640 with compute capability 3.0. I call cudaDeviceSynchronize() after the kernel call, but it does not work either. I searched through the Internet and this forum and they told us to explicitly set the architecture 2.0 during compilation. I did it and, again, it didn’t work.
Can someone help me on this problem?
It may be that your kernel is not running for some other reason. Then it will seem as if printf from the kernel is not working.
If you are running on a cc2.0 or newer GPU, and compiling with the proper arch switches, that should be all you need (and you need to include stdio.h, of course, just as you would for any printf usage).
I suggest adding proper cuda error checking to your code to see if there are other errors that are preventing the kernel from running correctly. If you’re not sure what proper cuda error checking is, please google “proper cuda error checking” and take the first hit.