Very basic cuPrintf question - getting magic=0 for -arch=sm_13

I am having problems with the SDK example simplePrintf.

I have a compute capability 1.3 GPU (GeForce GTX 285).

If I compile put a “for loop” around the call to the kernel invocation in simplePrintf.cu and compile with “-arch=sm_13”, I only get output to stdout for the first and 128th invocation of the kernel.

When I print the value of the variable “magic” in cudaPrintfDisplay() in cuPrintf.cu, it is 51217, which is the value of CUPRINTF_SM11_MAGIC, after the 1st and 128th invocation of the kernel, but 0 otherwise.

Note: magic==51216 always and everything prints if I don’t compile with -arch=sm_13, but I need “-arch=sm_13” for double precision support.

Here’s the for loop code:

int ii;
for( ii=0; ii<256; ii++ )
{
testKernel<<<dimGrid, dimBlock>>>(10);
cutilDeviceSynchronize();
std::cout << “ii=” << ii << std::endl;
cudaPrintfDisplay( stdout, truee );
}

I must be missing something really obvious.

Thanks.

Chris

It turns out that there is a bug in cuPrintf!!

psr found it and posted the bug fix here:

http://forums.nvidia.com/index.php?showtopic=152643&st=0&p=1160927&hl=cuprintf&fromsearch=1&#entry1160927

Specifically:

To fix, change line 797 in cuPrintf.cu

From:

cudaMemcpy(&magic, printfbuf_device, sizeof(unsigned short), cudaMemcpyDeviceToHost);

to:

cudaMemcpy(&magic, printfbuf_start, sizeof(unsigned short), cudaMemcpyDeviceToHost);

For me, the line number was a little off, but the switch from “printf_device” to “printf_start” worked like a charm!

Thank you psr!

Chris