Calling a host function from a __device__/__global__ function

I’m a new guy to Cuda.
I add a printf() in the global function to watch some variables.
When building it returns an Error: calling a host function from a device/global function is only allowed in device emulation mode

i am building under the EmuDebug. How can i set the device emulation mode???
how can i watch the variables in the global/device function

Thank for all replies

I also have the same query!

compile with make emu=1 is all you should have to do.

Alternatively, you can compile with the –device-emulation option

Yeah,Thank You.

Right click the .cu file in the Solution explorer and choose properties. In the General Page, modify the Emulation Mode Yes.

then you get it.

You canNOT call a Host function sitting inside a Global/Device function in “Debug” and “Release” modes.

You can call a Host function from a Global/Device function in “Emudebug” and “EmuRelease” modes.

If you want to see the variable values in “Debug” and “Release” modes, write into a file and save that file into the Disc.

FILE* fp = fopen( “C:\ThreadIdxs.txt”, “wt” );

if( fp )


	for( long i=0; i<W*H; ++i )

		fprintf( fp, "%d\n", hSum[i] );



Hello MAnjunath,

I aslo started CUDA very recently. I have written a simple program that adds two vectors. The program is here:



  • This is a example of the CUDA program from Chapter 2 page no. 7. It adds two vecotrs A, and B and saves the result in vector C


#include <stdio.h>

#include <stdlib.h>

#include <cuda_runtime.h>

#include <cutil.h>

#include <string.h>

#include <math.h>

#include <cutil_inline.h>

#include <cuda.h>

#include <conio.h>

//kernel Decleration

global void vecAdd(float* A, float* B, float* C )


int i=threadIdx.x;

C[i]=A[i]+ B[i];

///*printf("%f", C[i]); // You canNOT call a Host function sitting inside a Global/Device function in “Debug” and “Release” modes.


void main()


int i;

float A[3];




float B[3];




float C[3];

//kernel invocation

vecAdd<<<1, 3>>>(A, B, C);



I have the following questions:

1- When we run the program in emulation mode, do we really take the advantage of the nVIDIA graphics card (9600 Gt, 512MB )present in the PC

2- How exactly i can see the value in C (which should be 2.000 4.0000 and 6.00000). You had given the hint that we can save the variable out put in a file. In my case it would be

FILE* fp = fopen( “C:\ThreadIdxs.txt”, “wt” );

if( fp )


	for( long i=0; i<3; ++i )

		fprintf( fp, "%d\n", C[i] );



am I right ? But the question is where will i put it. When I am putting it in main the answer is 0 0 0 , which is wrong

I would be thankful for you for your help in this regard.


You might want to keep reading the CUDA programming guide until you finish Chapter 4. Then you will understand why you are getting an incorrect answer (hint: you are getting zeros because your kernel is never launching), the output to file is working correctly.

Kiran_CUDA: You can not call your kernel function with pointers to the host memory, the pointers must be to the device memory, you have to allocate memory on the device first (using cudaMalloc), then copy the A and the B arrays (using cudaMemCpy), then run the kernel with the pointers to the device memory, and then copy back the result.

Thanks a lot wanderine and avidday!

well I have one more question. In emulation mode what role does a graphics card plays (if it is installed)? or is it completely detached from the whole process?

In emulation, the GPU isn’t touched. Everything is done on the host CPU in host memory, with the emulation layer launching one CPU thread per GPU thread requested by the CUDA code. Threads are serviced sequentially and in-order. This means that there are a range of potential problems that emulation cannot detect, like race conditions, coherency problems, and certainly classes of improper device memory usage.

It is not possible to call host function from global and device functions. Because printf() is a host funtion so you got error.

To see the value of any variable or array inside global and/or device function you have to do following things:

  • allocate device memory of size of expected varible or array whose value you want to see.

  • allocate host memory of same size.

  • pass variable that holds address of allocated device memory to global function.

  • Assiging the value on this allocated memory inside global function.

-after completion of execution of global function you have to copy this device memory into host memory using cudaMemcpy().

-then use printf() to see the expected value using host memory.