Printf isn't working in my kernel?

I believe I’ve done everything correctly, the program compiles just fine (using -arch sm_20), and I have a GPU that meets the requirements for kernel-level printfs…

I have the following simple kernel:

__global__ void HelloWorld()

{

   printf("Hello World!\n");

}

and I call it like this:

HelloWorld<<<1, 32>>>();

Yet the printf statements are never displayed. Does anyone have any ideas as to why this would be occurring?

Try adding a call to cudaThreadSynchronize() after the call to HelloWorld(). It sounds like the buffer used by device-side printf() doesn’t get flushed.