Limited output of kernel printf

Hi all,
I’m trying to debug a molecular modeling application, which requires calculating a large number of data points (~300k). In the process, I’d like to be able to get output from the GPU via printf calls. I’m redirecting the output to a file.

Unfortunately, the number of executed printf’s is limited to somewhere between 20k and 50k times (depending on the size of each line). If there are more, no output is generated. Is there a way around this limitation? I’m using CUDA 4.0 with 270.41.19 driver on CentOS 5.5 (GTX480).

The other alternative I know of (getting my numbers into an array on the GPU, cudaMemcpy it to the host and print out) is a major headache. Besides, if any error is thrown in the kernel, I get no output and no clue what happened.

Any advice would be much appreciated.


Reading through massive printfs is not the way to debug this. What you should do is pick a single atom which you know is misbehaving and then track just that atom.


or perhaps a single thread
if(blockIdx.x==… && threadIdx.x==…)

This is usually much easier to debug.

I have debugged many large numeric problems and this always seems to be the most efficient way.

If I recall correctly printf uses a fixed size 8 MB circular buffer.
You can get around this by allocating a large char buffer in global memory and writing to it instead.

Thanks for your reply, Justin. I did end up going the way of printing out of a single thread and/or skipping steps in the simulation. Works well, indeed.
Akavo, thanks for the explanation of the limit - it’s good to know the source of the size restriction.


You can also set the size of the printf buffer using [font=“Courier New”]cuCtxSetLimit(CU_LIMIT_PRINTF_FIFO_SIZE, …)[/font].