I have written a scan library.
My problem is that I get cudaErrorLaunchTimeout whenever I insert a printf in my kernel and device code. The code runs fine with no error when I do not have the printf (except logical errors). I need the printf on device side to debug my improved scan code.
To tell the truth I do not know why this happens because before I restructured my code I could do printf without a error. Now it just casuses my display drivers to reset and windows says display drivers has recovered (+ the timeout error).
before restructuring I had 1 big prefix_sum.cu file with a main function that called a host scan function which copies host data to device and then call a recursive scan function on device data. There I called my kernel and could do printf without timeout error.
Now I have extracted my host functions (scan, scan_recursive) to a ScanDispatcher.cu and kernel code is in prefix_sum_kernel.cu file (before they were in same .cu file). main function calls ScanDispatcher in Scandispatcher.cu, which it’s purpose is to instantiate correct template arguments for scan and scan_recursive (in ScanDispatcher.cu). After that scan_recursive calls a kernel in prefix_sum_kernel that has a printf(). Then timeout will happen.
code example to illustrate my problem:
//Scandispatcher.cu
#include "prefix_sum_kernel.cu"
Scan_recursive<...>(..)
{
//invoke kernel in prefix_sum_kernel.cu
scan_kernel<..><<<.......>>>(..)
}
Scan<...>()
{
//copy host data to device
//call scan_recursive with template instantiations p
scan_recursive<..>(..);
}
ScanDispatcher(..)
{
//instatiate templates given parameters p
//call Scan<p>(..)
Scan<p>(..);
}
My prefix_sum_kernel.cu
..
#include <cuda.h>
..
__global__ scan_kernel()
{
..
//call a printf somehwhere
printf("a");
//launch timeout error received
..
}
note: It does printf + gives the error + display drivers resets.