Printf does not work in emulation mode /tmp/xxxxxxxx_stub.c: no such file or directory


I am having difficulty finding what is wrong with my program.
I have a program with many functions but I only need one to be cuda_ize. It seems that I need to have cuda calling function as well as the kernel enclosed in ( "extern “C” ). It this the way to do?
When I call the kernel with <<< >>> ,is there a limit on the number of variables that can be passed ? I have 13 floats and arrays. Even though the printf function is just at the beginning of the kernel, when I looked with the debugger, it never reaches this function and the error is as I describe with missing …stub.c file.
I hope someone can give some hints,

This a simplified code :


#include <stdio.h>

float ins[64][3];

global void function_4(double dt, int npart)


printf( " ============================ npart %d \n",npart);


int main ( )


    double dt= 1e-11 ;

    int npart=600;

int nbytes = npart ;

int device = 0;


float *d_input=0, *d_output=0;

cudaMalloc( (void**)&d_input,3* nbytes);

cudaMalloc( (void**)&d_output,3* nbytes);

float *h_data=ins[0];

h_data = (float*)malloc(3*nbytes);

cudaMemcpy( d_input, h_data, 3*nbytes, cudaMemcpyHostToDevice );

dim3 block(16);

dim3 grid( npart/block.x );

/***************** execute kernel *****************/


cudaMemcpy( h_data, d_output, 3*nbytes, cudaMemcpyDeviceToHost );

if( d_input )

    cudaFree( d_input );

if( d_output )

    cudaFree( d_output );

if( h_data )

    free( h_data );



And the corresponding debugger output :


35 dim3 block(16);

(gdb) s

dim3 (this=0x7ffff5059cc0, x=16, y=1, z=1) at /SATA/500G/CUDA/cuda_30/cuda/bin/…/include/vector_types.h:479

479 host device dim3(unsigned int x = 1, unsigned int y = 1, unsigned int z = 1) : x(x), y(y), z(z) {}

(gdb) s

main () at

36 dim3 grid( npart/block.x );

(gdb) s

dim3 (this=0x7ffff5059cb0, x=37, y=1, z=1) at /SATA/500G/CUDA/cuda_30/cuda/bin/…/include/vector_types.h:479

479 host device dim3(unsigned int x = 1, unsigned int y = 1, unsigned int z = 1) : x(x), y(y), z(z) {}

(gdb) s

main () at

41 function_4<<<grid,block>>>(dt,npart);

(gdb) s

function_4__entry (__cuda_0=9.9999999999999994e-12, __cuda_1=600) at

7 {

(gdb) s

__device_stub__Z10function_4di (__par0=9.9999999999999994e-12, __par1=600) at /tmp/tmpxft_00000dd7_00000000-1_main_cuda_function.cudafe1.stub.c:10

10 /tmp/tmpxft_00000dd7_00000000-1_main_cuda_function.cudafe1.stub.c: No such file or directory.

    in /tmp/tmpxft_00000dd7_00000000-1_main_cuda_function.cudafe1.stub.c

(gdb) s

cudaLaunch (entry=0x40b9ec "UH\211åH\203ì\020ò\017\021Eø\211}ô\213}ôò\017\020EøèpÿÿÿÉÃUH\211åH\213=\n½ ")

at /SATA/500G/CUDA/cuda_30/cuda/bin/../include/cuda_runtime.h:714

714 return cudaLaunch((const char*)entry);

(gdb) s

715 }


There’s a little bit of code called cuPrintf you can download if you’re a registered developer. If not, I believe that someone posted their own implementation of it in the programming forum once…you might be able to find it if you search for “printf”.

I’m not very handy with Linux, so someone else will have to take a look at the error messages.

This is in EMULATION mode, printf should work! Or I am missing something?

This is likely unrelated, but you do have some code that’s not doing what you want it to:

float *h_data=ins[0];  /* This initialization is immediately overwritten by the next line */

h_data = (float*)malloc(3*nbytes);

cudaMemcpy( d_input, h_data, 3*nbytes, cudaMemcpyHostToDevice ); /* This memcopy is copying UNINITIALIZED host memory to the device */

These likely have nothing to do with your printf() question, but they stuck out in your code.

I just copied from another project Makefile to my present Makefile and modify for my task and now everything is OK.

The only difference I can see so far is the order of the two following flags:

nvcc -D_DEBUG -deviceemu -> this one is OK

instead of

nvcc -deviceemu -D_DEBUG -> this one does not work

There might be something else, but right now, it works.

I have the same problem… I guess its the new CUDA 3.0 thing (I assume you are using cuda 3.0 and new drivers). The emulation mode is not supported anymore. See this thread.

So, if you were using emumode then you have to switch to either ocelot or cuprintf or cuda gdb(on linux only). After learning cuda gdb it seems nice and I can bet cuprintf will be great as it allows direct device data output.

I applied for registered developer a week ago to obtain “cuprintf”, but haven’t heard anything from Nvidia. Its little frustrating. :confused:

I just deleted my post from yesterday. It was NOT the solution.

The real problem was that the kernel call was passing <<< 0 ,0 >>> to the kernel. No wonder no printing was done in the kernel.

Now everything is working as it should.