Templated kernels and printf

I’m trying to debug a kernel, and I’d like to use printf in emulation mode. I can do this in ‘normal’ kernels, but I’m hitting a problem with a templated one. A small test case:


global void TestKernel( int *myArr, const int nVals ) {

const int iMine = threadIdx.x + (blockDim.x*blockIdx.x);

if( iMine >= nVals ) {



printf( “Hello!\n” );

switch( myInt ) {

case 0:

myArr[iMine] -= 1;


case 1:

myArr[iMine] += 1;






void MyKernelTest( int *myArr, const int nVals ) {

int *d_myArr;

CUDA_SAFE_CALL( cudaMalloc( (void**)&d_myArr, nVals ) );

CUDA_SAFE_CALL( cudaMemcpy( d_myArr, myArr, nVals*sizeof(int), cudaMemcpyHostToDevice ) );

dim3 grid, threads;

threads.x = 256;

threads.y = threads.z = 1;

grid.x = (int)ceil( ((float)nVals ) / ( (float)threads.x ) );

grid.y = grid.z = 1;

TestKernel<1><<<grid,threads>>>( d_myArr, nVals );

CUDA_SAFE_CALL( cudaMemcpy( myArr, d_myArr, nVals*sizeof(int), cudaMemcpyDeviceToHost ) );

CUDA_SAFE_CALL( cudaFree( d_myArr ) );

}[/codebox]On compiling, I get the following whinge:

[font=“Courier New”]/usr/local/cuda/bin/nvcc -D_DEBUG -deviceemu -I. -I/usr/local/cuda/include -I/usr/local/NVIDIA_CUDA_SDK//common//…/common/inc -DUNIX -g -I/home/me/prog/src -I/home/me/proginclude -I/usr/local/include/wcslib -I/usr/include/atlas/ -I/usr/include/cfitsio/ -I/home/me/HP_2.10/include/ -o obj/emudebug/tileresponse_cuda.cu_o -c …/src/tileresponse_cuda.cu

…/src/tileresponse_cuda.cu: In function ‘void __globfunc__Z10TestKernelILi1EEvPii(int*, int)’:

…/src/tileresponse_cuda.cu:189: error: ‘_ZZ10TestKernelILi1EEvPiiEs’ was not declared in this scope


If I comment out the [font=“Courier New”]printf[/font] it compiles fine. Using [font=“Courier New”]cout[/font] doesn’t give me any joy either. This is with CUDA 2.0 (I don’t have a spare machine for the 2.1 beta).

Is that even allowed? calling printf() on from device code? where would you expect that code to execute?

In emulation mode, I’d sort of be expecting it to execute on the CPU…

The problem is not with printf, but with the character string argument.
Character strings in template device functions currently don’t compile: try a character array in stead.

This will be fixed in the 2.2 release.

Thanks - it’s good to know that I wasn’t missing something :)

I have gotten printf to work in my self compiled project. Make sure that you are calling the function in an executable area and make sure that your file includes iostream.

#include ;

If you get a compile error when adding the iostream I was able to solve that issue by adding a path to the location of the library similar to the way that you do it for the CUDA bin and lib files.

soooo like this

export C_PATH

as for the array of chars or strings, that is also not needed. Include the string library file and it will do it just fine. However I am stuck when it comes to concatenating variables to the string. If you can help with that it would be awesome. If you need to see my code let me know.