Hi everyone, I ran into a problem compiling some code with VS2005 and Cuda 2.2 if I run with the -deviceemu flag and there’s C++ code in the host functions. (I had a related issue which I posted about in: http://forums.nvidia.com/index.php?showtopic=94569&st=0 a few weeks ago. I have only just been able to upgrade to VS2005 and Cuda 2.2 to test). I have WinXP and a Quadro 570 if that is useful information.
I’ve included some code at the end that produces the error for me.
To Generate Problem:
a) there’s C++ code in the host functions in the .cu files
B) that C++ function is referenced from a .cpp file elsewhere
c) you are compiling with -deviceemu
Result: you get unresolved symbols with respect to items in the .cu file. It looks like the .cu’s associated object file gets different symbols in device emulation mode versus not but the rest of the code remains the same
Compiling normally everything links and runs
dumpbin on kernel.obj:
023 00000000 SECT8 notype () External | ?launch_kernel@@YA_NH@Z (bool __cdecl launch_kernel(int))
Files generated during compile:
kernels.cu
tmpxft_00000edc_00000000-3_kernels.cudafe1.gpu
tmpxft_00000edc_00000000-8_kernels.cudafe2.gpu
tmpxft_00000edc_00000000-3_kernels.cudafe1.cpp
tmpxft_00000edc_00000000-13_kernels.ii
Compiling with -deviceemu flag:
Linker error:
main.obj : error LNK2019: unresolved external symbol “bool __cdecl launch_kernel(int)” (?launch_kernel@@YA_NH@Z) referenced in function _main test-cuda.exe : fatal error LNK1120: 1 unresolved externals
dumpbin on kernel.obj:
072 00000000 SECT1C notype () External | __Z13launch_kerneli
Files generated during compile:
kernels.cu
tmpxft_00000f6c_00000000-3_kernels.cudafe1.c
tmpxft_00000f6c_00000000-7_kernels.i
Also, it looks like nvcc turns any C++ host code into C code during preprocessing in emulation mode? (turning classes into structs and bools into chars or ints?)
Other Notes:
- If I do extern C around the function in the header file (and get rid of any C++ specific stuff) I can compile with -deviceemu and everything
works.
-
If all the code is contained in the single .cu file, rather than split between a .cu file and a .cpp file, everything works fine
-
If I run in normal mode, everything appears to work fine.
Is the solution to only use C code in files that nvcc will be asked to parse?
Any help would be appreciated,
Thanks,
Larissa
----main.cpp----
#include "kernels.h"
#include <stdio.h>
int main()
{
if (launch_kernel(1024*1024))
printf("Kernel launched successfully\n");
else
printf("Kernel failure\n");
}
----kernels.h----
bool launch_kernel(int pixels);
----kernels.cu----
#include <stdio.h>
#include "kernels.h"
__global__ void do_something(float *dest, float *sourcea, float
*sourceb, int size)
{
int pixel = blockIdx.x * blockDim.x + threadIdx.x;
if (pixel < size) {
dest[pixel] = sourcea[pixel] * sourceb[pixel];
}
}
bool launch_kernel(int pixels)
{
int block_size = 128;
int n_blocks = (pixels / block_size) + (pixels % block_size)?1:0;
int size = sizeof(float) * pixels;
float *dest;
float *a;
float *b;
float *tmp = (float *)malloc(size);
// allocate memory
cudaMalloc((void **)&dest, size);
cudaMalloc((void **)&a, size);
cudaMalloc((void **)&b, size);
// fill memory with random data for testing purposes only
for (int i = 0; i < pixels; i++)
tmp[i] = (float)i * .5;
cudaMemcpy(a, tmp, size, cudaMemcpyHostToDevice);
for (int i = 0; i < pixels; i++)
tmp[i] = i*i;
cudaMemcpy(b, tmp, size, cudaMemcpyHostToDevice);
do_something<<<n_blocks, block_size>>>(dest, a, b, pixels);
cudaError_t lastError = cudaGetLastError();
if (lastError == cudaSuccess)
return true;
else
return false;
}