cuPrintf() not printing

Hey,

Got a strange problem with this great debugging tool, cuPrintf(). I am calling the necessary cudaPrintfInit(), cudaPrintfDisplay() and cudaPrintfEnd() correctly but no prints appear on my screen. My kernel function definitely calls cuPrintf(). When I use cuPrintf() with a very simple kernel like the one in the NVIDIA SDK, print shows up. Just not when I call my own kernel which is sligthly more complicated and is composed of about 100000 threads. (I tried launching just one of them without any other result).

Any suggestions?