Fermi launch timeout using printf

I have written a scan library.

My problem is that I get cudaErrorLaunchTimeout whenever I insert a printf in my kernel and device code. The code runs fine with no error when I do not have the printf (except logical errors). I need the printf on device side to debug my improved scan code.

To tell the truth I do not know why this happens because before I restructured my code I could do printf without a error. Now it just casuses my display drivers to reset and windows says display drivers has recovered (+ the timeout error).

before restructuring I had 1 big prefix_sum.cu file with a main function that called a host scan function which copies host data to device and then call a recursive scan function on device data. There I called my kernel and could do printf without timeout error.

Now I have extracted my host functions (scan, scan_recursive) to a ScanDispatcher.cu and kernel code is in prefix_sum_kernel.cu file (before they were in same .cu file). main function calls ScanDispatcher in Scandispatcher.cu, which it’s purpose is to instantiate correct template arguments for scan and scan_recursive (in ScanDispatcher.cu). After that scan_recursive calls a kernel in prefix_sum_kernel that has a printf(). Then timeout will happen.

code example to illustrate my problem:

//Scandispatcher.cu

#include "prefix_sum_kernel.cu"

Scan_recursive<...>(..)

{

//invoke kernel in prefix_sum_kernel.cu

scan_kernel<..><<<.......>>>(..)

}

Scan<...>()

{

//copy host data to device

//call scan_recursive with template instantiations p

scan_recursive<..>(..);

}

ScanDispatcher(..)

{

//instatiate templates given parameters p

//call Scan<p>(..)

Scan<p>(..);

}

My prefix_sum_kernel.cu

..

#include <cuda.h>

..

__global__ scan_kernel()

{

..

//call a printf somehwhere

printf("a");

//launch timeout error received

..

}

note: It does printf + gives the error + display drivers resets.