I’m trying to debug a kernel, and I’d like to use printf in emulation mode. I can do this in ‘normal’ kernels, but I’m hitting a problem with a templated one. A small test case:
[codebox]template
global void TestKernel( int *myArr, const int nVals ) {
const int iMine = threadIdx.x + (blockDim.x*blockIdx.x);
if( iMine >= nVals ) {
return;
}
printf( “Hello!\n” );
switch( myInt ) {
case 0:
myArr[iMine] -= 1;
break;
case 1:
myArr[iMine] += 1;
break;
default:
break;
}
}
void MyKernelTest( int *myArr, const int nVals ) {
int *d_myArr;
CUDA_SAFE_CALL( cudaMalloc( (void**)&d_myArr, nVals ) );
CUDA_SAFE_CALL( cudaMemcpy( d_myArr, myArr, nVals*sizeof(int), cudaMemcpyHostToDevice ) );
dim3 grid, threads;
threads.x = 256;
threads.y = threads.z = 1;
grid.x = (int)ceil( ((float)nVals ) / ( (float)threads.x ) );
grid.y = grid.z = 1;
TestKernel<1><<<grid,threads>>>( d_myArr, nVals );
CUDA_SAFE_CALL( cudaMemcpy( myArr, d_myArr, nVals*sizeof(int), cudaMemcpyDeviceToHost ) );
CUDA_SAFE_CALL( cudaFree( d_myArr ) );
}[/codebox]On compiling, I get the following whinge:
[font=“Courier New”]/usr/local/cuda/bin/nvcc -D_DEBUG -deviceemu -I. -I/usr/local/cuda/include -I/usr/local/NVIDIA_CUDA_SDK//common//…/common/inc -DUNIX -g -I/home/me/prog/src -I/home/me/proginclude -I/usr/local/include/wcslib -I/usr/include/atlas/ -I/usr/include/cfitsio/ -I/home/me/HP_2.10/include/ -o obj/emudebug/tileresponse_cuda.cu_o -c …/src/tileresponse_cuda.cu
…/src/tileresponse_cuda.cu: In function ‘void __globfunc__Z10TestKernelILi1EEvPii(int*, int)’:
…/src/tileresponse_cuda.cu:189: error: ‘_ZZ10TestKernelILi1EEvPiiEs’ was not declared in this scope
[/font]
If I comment out the [font=“Courier New”]printf[/font] it compiles fine. Using [font=“Courier New”]cout[/font] doesn’t give me any joy either. This is with CUDA 2.0 (I don’t have a spare machine for the 2.1 beta).