Linker errors for "__host__ __device__ functions"

I am using “host device” identifiers to compile some code for both the kernel and the host machine. However when I use these functions from the host code I get linker errors, “unresolved symbol”.

See below for the code and my rationale for this setup. I am wondering if this is a bug in the NVCC compiler or some error on my side?

functions.h

#pragma once 

extern "C" {

  void hello_cuda();

};

#ifndef __CUDACC__

#define __host__

#define __device__

#endif

class A {

public:

  __host__ __device__ int add_cuda(int b);

  __host__ __device__ void operator=(float);

  bool isequal(int b);

  bool add(int c);

private:

  int c;

};

main_cpp.cpp

#include "functions.h"

int main() {

  A a;

  a.add_cuda(5);

  a.isequal(45);

  a.add(123);

  a = 13.0;

 hello_cuda();

  return 0;

}

main_kernel.cu

#include <stdio.h>

#include "main.h"

#include "functions.cpp"

#pragma comment(lib, "cudart.lib")

__global__ void bob() {

  A a;

  a.add_cuda(14);

  a = 13.0;

}

void hello_cuda() {

  printf("Hello world!\n");

 bob<<<1, 1>>>();

}

functions.cpp

#include "main.h"

int A::add_cuda(int b) {return b + 3;}

bool A::add(int b) {return b + 3;}

bool A::isequal(int b) {return b == c;}

void A::operator=(float c) {

  for (int i = 0; i < 10; i++)

    c += i;

}

main.cpp is my host file, main_kernel.cu is the kernel. I have some functions in functions.cpp. Functions from here are used both in host and kernel mode (namely add_cuda and operator). I need to include “functions.cpp” in the kernel file because CUDA does not work with externals for device code. Functions.cpp is not compiled, because then I will get multiply defined symbols.

When I link this I get unresolved symbols for add_cuda() and the operator=, the host/device functions. No matter what I do, I cannot get rid of this error. The only “solution” that I have found is to add a dummy function to the kernel file that uses the host/device functions. Then they are exported. Like this:

void dummy() {

  A a;

  a.add_cuda(123);

  a = 13.0;

}

Is there something that I am doing wrongly or is NVCC simply not exporting host device functions outside the object file that are not called themselves in the same file? Is this a bug?

We don’t support C++ from kernels yet. That’s on the roadmap, but at the moment it’s completely unsupported–some things might work, but some things might not.