Printf inside Kernel with 2D block

Hello everyone,

I’m trying to printf from a CUDA kernel on a GT 480 (Fermi). I’m having no problems with the printf call when the block size is 1D, but when it is 2D, as shown in the code snippet below, I’m not receiving any output when the program is run. Anyone know why?

1 #include <stdio.h>

  2 

  3 __global__ void printEx(void)

  4 {

  5         int idx = threadIdx.x + blockIdx.x;

  6         int idy = threadIdx.y + blockIdx.y;

  7 

  8         if(idx < 2 && idy < 2)

  9         {

 10                 printf("Ping\n");

 11         }

 12 }

 13 

 14 int main()

 15 {

 16         dim3 dimBlock(256,256);

 17         dim3 dimGrid(1);

 18 

 19         printEx <<<dimGrid, dimBlock>>> ();

 20         cudaThreadSynchronize();

 21 

 22         return 0;

 23 }

You are using too many threads.
A single block can have a maximum of 1024 threads on Fermi.

Oh gosh, how embarrassing. Thanks so much; I figured it was a simple mistake.