Simple trick with freopen does not work - after the kernel in which a number of printfs were done is finished the file to which I have redirected the stdout is still empty.
System is XP64, two GTX480 cards are installed. All interactions with cards are done via worker threads - one thread per card is created when app starts.
cuPrintf was much better in this sense … it was possible to explicitly specify the destination using cudaPrintfDisplay.