Do functions of user-defined class have to be inline if instances of the class are created in kernel

I have a user defined class. In my global function, I create instances of this class and call some of the member functions. I found that, I have to make all the member functions inline. Otherwise the compilation fails, and no useful information is provided.

I simplified my code to better illustrate the problem.

my header file (header.h):

#include <device_launch_parameters.h>

class ATM
{
public:
host device ATM() {}
host device double rand();
};

my cpp file (source.cpp)

#include <device_launch_parameters.h>
#include “Header.h”

host device double ATM::rand()
{
return 0.5;
}

My cu file:

#include “Header.h”
#include<device_launch_parameters.h>
#include <cuda_runtime.h>

global void test()
{
ATM* atm = new ATM();
double d = atm->rand();
}

int main(void)
{
int threadsPerBlock = 512;
int blocksPerGrid = 100;
test <<<blocksPerGrid, threadsPerBlock >>>();

return 0;

}

However, it works if I get rid of the cpp file, and modified the header file:

header.h

#include <device_launch_parameters.h>

class ATM
{
public:
host device ATM() {}
host device double rand()
{
return 0.5;
}
};

Could someone help explain? Thank you very much.

I don’t see how one could expect to successfully compile a .cpp file that contains code with host and device attributes, as those are CUDA features that are not supported by the C++ compiler on the host that processes that .cpp file. host and device only make sense in a CUDA source file (with a .cu extension).

I did try to remove host device from the cpp file, too, but it still fails to build. The only way to make it work is to make rand() an inline function.

However, if I don’t create any instances in the kernel function, then it’s OK to put the implementation of rand() in a cpp file.

A function callable from device code must be compiled by nvcc. The normal way to do that is to put such functions in a .cu file. If you work through all the other situations you are reporting, you will find that the ones that work ultimately have the device-callable function being compiled from nvcc, if it is actually used in device code.