Strange cuPrintf() behaviour


Inside my kernel function (declared global) I executed the following code:

if ( blockIdx.x * blockDim.x + threadIdx.x == 0 )

    for ( int k = 0; k < 9; k++ )


I get an output:


Can anyone speculate where did 1,3,5,7 go?? :shock:

I used the cuPrintf from the cuda sdk, with the standard init and printing.

When I used this specific code in one application it ran without problems. In another it gave these strange results.

I have only one gpu, how can I debug the debugging tool?! :wallbash:


You’re incrementing k twice per iteration.

Oops, I must have been really tired, the code should be:

if ( blockIdx.x * blockDim.x + threadIdx.x == 0 )

    for ( int k = 0; k < 9; k++ )


and the output is: 4,5,6,7,8,

It seems that if I call cuPrintf too many times in a row, the beginning of the buffer disappears.

I played with the number of blocks allocated for the kernel, and that seems where the problem lies. If I lower the number of blocks from 63 to 2, the problem disappears. I should play with this more.

I should also note that I need to ‘init’ the application a few times by running it, and getting garbage results before I get real results.

Okay, I changed the buffer size from the default to:


and now it works. I’m not sure why it needs such a large buffer even if I write in the end less than 20 characters. I think I need to study how the cuprintf mechanism works. I would have expected the buffer size to be dependent on the size of text I write, and not on the number of blocks and threads in the kernel.

Does anyone else use cuPrintf, or everyone has two gpus and more?