Size of printf buffer

gks · December 2, 2020, 4:15pm

Hi all,

I am aware than the size of the printf buffer is 1MB. cudaDeviceGetLimit(&printfBufferSz,cudaLimitPrintfFifoSize) also reports 1048576 on my system.
However, i have some code that generates quite a lot of printf data and when i run it i get roughly 120KB of output. What’s more weird is that if i set cudaThreadSetLimit(cudaLimitPrintfFifoSize, printfBufferSz * 2) i indeed get roughly 240KB of output. Is there any reason for this incosistency? Am i missing something?

I am working on a gtx 950 and cuda 10.

Robert_Crovella · December 2, 2020, 4:35pm

The buffer contains an unprocessed version of the printf data which has overhead. Just because you have a 1MB buffer does not mean that after processing that will translate to 1048576 characters of print-out.

You can get an idea of this from a careful read of the the docs:

Unlike the C-standard printf(), which returns the number of characters printed, CUDA’s printf() returns the number of arguments parsed.

Final formatting of the printf() output takes place on the host system.

The following API functions get and set the size of the buffer used to transfer the printf() arguments and internal metadata

gks · December 2, 2020, 5:07pm

@Robert_Crovella Thanks for the response. I actually read that, but i get 1/9-th of the buffer. Could this internal metadata account for that much? Even if the format string was copied along with the arguments to the buffer for each individual printf, then i still don’t understand how that discrepancy is justified.

Robert_Crovella · December 2, 2020, 5:13pm

I would presume that it does. I haven’t measured it myself. I also imagine that the overhead will vary depending on probably a lot of undocumented parameters, such as the exact format string, number of arguments, type of arguments, etc.

I don’t have any further explanation. I generally advise people that the in-kernel printf is not really suitable for large scale “bulk” output.

Robert_Crovella · December 2, 2020, 5:38pm

With a very simple test application, outputting 10 bytes per thread (no arguments beyond the format string) I get 40960 bytes output for 1MB buffer, so I am getting 1/25 of the available space.

I wouldn’t be surprised based on my read of the docs if the highest “throughput” or “efficiency” would come about via sending multiple number of arguments, perhaps up to the limit (32).

When I change to a format string with 5 %s arguments, 10 bytes each, I get ~250k out of the 1MB possible.

This suggests to me that the “efficiency” is a function of exactly how you format things.

Topic		Replies	Views
Can I print-to-file from a kernel? CUDA Programming and Performance cuda	10	3658	September 29, 2020
notification of printf cudaLimitPrintfFifoSize buffer exceeded? CUDA Programming and Performance	0	637	June 23, 2014
Limited output of kernel printf CUDA Programming and Performance	4	2293	September 1, 2011
printf vs cuPrintf in kernels CUDA Programming and Performance	2	5816	February 5, 2013
is there any limit of printf buffer in a kernel? CUDA Programming and Performance	1	2058	February 12, 2014
Batch limit of 1D cufft CUDA Programming and Performance	1	1427	October 10, 2009
Restriction of printf in optix OptiX	1	52	October 17, 2024
Adjusting CUDA buffer sizes by GPU type CUDA Programming and Performance	2	3173	February 2, 2017
Strange cuPrintf() behaviour CUDA Programming and Performance	3	4480	May 9, 2011
Estimating kernel size? CUDA Programming and Performance	5	2516	March 1, 2010

Size of printf buffer

Related topics