-deviceemu and C++ code in host functions

Hi everyone, I ran into a problem compiling some code with VS2005 and Cuda 2.2 if I run with the -deviceemu flag and there’s C++ code in the host functions. (I had a related issue which I posted about in: http://forums.nvidia.com/index.php?showtopic=94569&st=0 a few weeks ago. I have only just been able to upgrade to VS2005 and Cuda 2.2 to test). I have WinXP and a Quadro 570 if that is useful information.

I’ve included some code at the end that produces the error for me.

To Generate Problem:

a) there’s C++ code in the host functions in the .cu files

B) that C++ function is referenced from a .cpp file elsewhere

c) you are compiling with -deviceemu

Result: you get unresolved symbols with respect to items in the .cu file. It looks like the .cu’s associated object file gets different symbols in device emulation mode versus not but the rest of the code remains the same

Compiling normally everything links and runs

dumpbin on kernel.obj:

023 00000000 SECT8 notype () External | ?launch_kernel@@YA_NH@Z (bool __cdecl launch_kernel(int))

Files generated during compile:






Compiling with -deviceemu flag:

Linker error:

main.obj : error LNK2019: unresolved external symbol “bool __cdecl launch_kernel(int)” (?launch_kernel@@YA_NH@Z) referenced in function _main test-cuda.exe : fatal error LNK1120: 1 unresolved externals

dumpbin on kernel.obj:

072 00000000 SECT1C notype () External | __Z13launch_kerneli

Files generated during compile:




Also, it looks like nvcc turns any C++ host code into C code during preprocessing in emulation mode? (turning classes into structs and bools into chars or ints?)

Other Notes:

  • If I do extern C around the function in the header file (and get rid of any C++ specific stuff) I can compile with -deviceemu and everything


  • If all the code is contained in the single .cu file, rather than split between a .cu file and a .cpp file, everything works fine

  • If I run in normal mode, everything appears to work fine.

Is the solution to only use C code in files that nvcc will be asked to parse?

Any help would be appreciated,




#include "kernels.h"

#include <stdio.h>

int main()


  if (launch_kernel(1024*1024))

	printf("Kernel launched successfully\n");


	printf("Kernel failure\n");



bool launch_kernel(int pixels);


#include <stdio.h>

#include "kernels.h"

__global__ void do_something(float *dest, float *sourcea, float 

*sourceb, int size)


  int pixel = blockIdx.x * blockDim.x + threadIdx.x;

  if (pixel < size) {

	dest[pixel] = sourcea[pixel] * sourceb[pixel];



bool launch_kernel(int pixels)


  int block_size = 128;

  int n_blocks = (pixels / block_size) + (pixels % block_size)?1:0;

int size = sizeof(float) * pixels;

float *dest;

  float *a;

  float *b;

  float *tmp = (float *)malloc(size);

// allocate memory

  cudaMalloc((void **)&dest, size);

  cudaMalloc((void **)&a, size);

  cudaMalloc((void **)&b, size);

// fill memory with random data for testing purposes only

  for (int i = 0; i < pixels; i++)

	tmp[i] = (float)i * .5;

  cudaMemcpy(a, tmp, size, cudaMemcpyHostToDevice);

  for (int i = 0; i < pixels; i++)

	tmp[i] = i*i;

  cudaMemcpy(b, tmp, size, cudaMemcpyHostToDevice);

do_something<<<n_blocks, block_size>>>(dest, a, b, pixels);

  cudaError_t lastError = cudaGetLastError();

  if (lastError == cudaSuccess)

	return true;


	return false;


Works fine on Linux, time to boot into Windows and see what’s wrong with that…

argh, Windows is breaking something and compiling this as C for some reason. Don’t entirely know why yet, but I’ve asked the compiler guys to look into it.

I’m glad you were able to reproduce it.

Since you said that linux works, I’ll just use that for debugging on our end for the time being. Thanks!

There’s another thread currently discussing the same issue:


You seem to have described the problem in much greater detail than us though ;).

– sorry, ignore this post, thought you reffered to another thread –

So is there a workaround for this issue?

not in 2.2, no