Hi All,
I’m struggling with CUDA printf functionality trying to make it work with MFC-based non-console app.
My application is dialog-based, the idea is to create a new console for standard output in order to examine the printfs made inside CUDA kernel.
Console creation code and redirection code loos like this (the idea is taken here: http://stackoverflow.com/questions/311955/redirecting-cout-to-a-console-in-windows, this the first solution found that actually works with MSVC2015):
AllocConsole();
freopen("CONIN$", "r", stdin);
freopen("CONOUT$", "w", stdout);
freopen("CONOUT$", "w", stderr);
std::wcout.clear();
std::cout.clear();
std::wcerr.clear();
std::cerr.clear();
std::wcin.clear();
std::cin.clear();
After that doing some tests:
cout << "cout: hello, world!\n";
printf("printf: hello, world!\n");
_tprintf(_T("_tprintf: hello, world!\n"));
And those three “hello, world!” strings to appear in the just created console.
After that trying to make CUDA kernel printf work:
__global__ void PrintfTestKernel(int nCUDADeviceIndex)
{
printf("nCUDADeviceIndex = %d, threadIdx.x = %d\n", nCUDADeviceIndex, threadIdx.x);
}
cudaError_t RunPrintfTest(int nCUDADeviceIndex)
{
cudaError_t err = cudaSuccess;
do
{
err = ::cudaSetDevice(nCUDADeviceIndex);
if (err != cudaSuccess)
break;
err = ::cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 1048576);
if (err != cudaSuccess)
break;
PrintfTestKernel<<<1, 1>>>(nCUDADeviceIndex);
err = ::cudaDeviceSynchronize();
if (err != cudaSuccess)
break;
} while (false);
return err;
}
printf("Before RunPrintfTest\n");
cudaError_t err = RunPrintfTest(0);
printf("After RunPrintfTest\n");
No CUDA errors detected, both “Before RunPrintfTest” and “After RunPrintfTest” strings do appear in the console - but no CUDA printf strings appear between them.
I’ve put the breakpoint inside the kernel and made sure that printf is actually invoked (using Nsight).
I’ve tried all sorts of flushes.
Nothing helps - CUDA printf simply puts nothing into the console and I have no idea whether printf actually fails or the code under the CUDA hood does not flush the output.
What can be done about it? I see the only solution in migration back to the ancient cuPrintf.
Any suggestions are welcome.
Thank you in advance!