CUDA kernel printf does not work (Windows, MFC-based application)

Hi All,

I’m struggling with CUDA printf functionality trying to make it work with MFC-based non-console app.

My application is dialog-based, the idea is to create a new console for standard output in order to examine the printfs made inside CUDA kernel.

Console creation code and redirection code loos like this (the idea is taken here:, this the first solution found that actually works with MSVC2015):

freopen("CONIN$", "r", stdin);
freopen("CONOUT$", "w", stdout);
freopen("CONOUT$", "w", stderr);

After that doing some tests:

cout << "cout: hello, world!\n";
printf("printf: hello, world!\n");
_tprintf(_T("_tprintf: hello, world!\n"));

And those three “hello, world!” strings to appear in the just created console.

After that trying to make CUDA kernel printf work:

__global__ void PrintfTestKernel(int nCUDADeviceIndex)
	printf("nCUDADeviceIndex = %d, threadIdx.x = %d\n", nCUDADeviceIndex, threadIdx.x);

cudaError_t RunPrintfTest(int nCUDADeviceIndex)
	cudaError_t err = cudaSuccess;

		err = ::cudaSetDevice(nCUDADeviceIndex);
		if (err != cudaSuccess)

		err = ::cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 1048576);
		if (err != cudaSuccess)

		PrintfTestKernel<<<1, 1>>>(nCUDADeviceIndex);
		err = ::cudaDeviceSynchronize();
		if (err != cudaSuccess)
	} while (false);

	return err;

printf("Before RunPrintfTest\n");
cudaError_t err = RunPrintfTest(0);
printf("After RunPrintfTest\n");

No CUDA errors detected, both “Before RunPrintfTest” and “After RunPrintfTest” strings do appear in the console - but no CUDA printf strings appear between them.

I’ve put the breakpoint inside the kernel and made sure that printf is actually invoked (using Nsight).
I’ve tried all sorts of flushes.

Nothing helps - CUDA printf simply puts nothing into the console and I have no idea whether printf actually fails or the code under the CUDA hood does not flush the output.

What can be done about it? I see the only solution in migration back to the ancient cuPrintf.

Any suggestions are welcome.

Thank you in advance!

cudaDeviceSynchronize() should guarantee that the buffer for the device-side printf() is flushed. I consider it more likely that the freopen() redirection trickery causes the kernel’s output to go to a pipe that is not connected to the new console.

Heh… I have tried may be ten or fifteen different ways of interception of stdout, including pipes.

Output of CUDA printfs simply disappears, I can’t find it anywhere. The only thing that works flawlessly is native console application… when it comes to a GUI-based app without a native console CUDA printfs become of no use.

What confuses me most is that such sort of questions regularly appear on the forum, Nvidia guys keep silence like if the way kernel printf works is a great mistery.

In the worst case, you might make a workaround by adapting the legacy cuPrintf.h code. You could just add your own variant hostside “output” query that dumps to your own character buffer to display in your own GUI any way you like. cuPrintf itself even gives a bit more control with its own cudaPrintfDisplay host function which takes a file stream argument. That explicit control might work better than the implicit “eventually, the host will write to console on flush” behavior of default device printf().

I read the forum regularly, but I don’t recall any previous questions regarding re-directing the output from kernel-side printf() [that doesn’t mean no such questions were asked].

I don’t know the details of the kernel-side printf(). I think it is quite likely that it internally uses APIs like fdup() or dup2(), and that connections such established get the rug pulled out from underneath them once you freopen(): due to the new assignment, any previous associations with the file handle are lost.

Consider filing an enhancement request with NVIDIA to officially document how to re-direct output from kernel-side printf(), maybe in the Best Practices Guide.

Seems like I will utilize cuPrintf, indeed. May be it’s even better comparing to printf, however, the topic of redirection is still interesting: for instance, CUDA assert uses stderr to output the assetion reason, one will never see it in a GUI-based app. Of course, it is possible to implement your own assert and use cuPrintf for the output, but that would be more complex and, thus, less integral solution.

Also, I will file a request regarding the printf redirection documentation. The funny side of all this is that CUDA is widely used, not only in console applications… and the topic of redirection has never been explained for years, since the moment of kernel printf introduction. There are printf-related topics that were never answered.

Guys, thank you very much for your suggestions!