Since emulation has been deprecated from cuda 3.0 and printf is now supported within the device code there would be no difference in what I’m trying to do. However, when I run the kernel the device stalls, the fan peaks from 30% to 80% and no output is sent to stdout.
I compiled the code using the SDK makefile with -arch sm_20 and the device is of compute capability 2.0, it’s a Tesla C2050. Is this a bug? Driver is 256.35 for 64bit Linux.
You might try reducing the number of printfs. I had a similar problem when I was trying to print from a large number of threads (probably thousands), but when I was only trying to execute a couple of printfs it worked fine.
(This was also on a c2050, linux, same driver.)
That leaves the question of why it should behave so…