Very basic cuPrintf question - getting magic=0 for -arch=sm_13

I am having problems with the SDK example simplePrintf.

I have a compute capability 1.3 GPU (GeForce GTX 285).

If I compile put a “for loop” around the call to the kernel invocation in and compile with “-arch=sm_13”, I only get output to stdout for the first and 128th invocation of the kernel.

When I print the value of the variable “magic” in cudaPrintfDisplay() in, it is 51217, which is the value of CUPRINTF_SM11_MAGIC, after the 1st and 128th invocation of the kernel, but 0 otherwise.

Note: magic==51216 always and everything prints if I don’t compile with -arch=sm_13, but I need “-arch=sm_13” for double precision support.

Here’s the for loop code:

int ii;
for( ii=0; ii<256; ii++ )
testKernel<<<dimGrid, dimBlock>>>(10);
std::cout << “ii=” << ii << std::endl;
cudaPrintfDisplay( stdout, truee );

I must be missing something really obvious.



It turns out that there is a bug in cuPrintf!!

psr found it and posted the bug fix here:


To fix, change line 797 in


cudaMemcpy(&magic, printfbuf_device, sizeof(unsigned short), cudaMemcpyDeviceToHost);


cudaMemcpy(&magic, printfbuf_start, sizeof(unsigned short), cudaMemcpyDeviceToHost);

For me, the line number was a little off, but the switch from “printf_device” to “printf_start” worked like a charm!

Thank you psr!