Calling a class from cuda-kernel

Hello,

i am doing my first steps with CUDA. My goal is to use CUDA in my project to use the CPU and GPU for the calculation. The first kernel was successfully compiled and runs perfect. I ‘improved’ that kernel by using a simple class. So, the problem begins because only inline methods can be called from the kernel. I used the following code:

[codebox]#include <stdio.h>

#include <cuda.h>

#include “math/TestClass.h”

global void square_array(float *a, int N)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx<N)

{

	math::TestClass test(39.0f);

	a[idx] = test.doSomething();

}

}[/codebox]

[codebox]#ifndef TESTCLASS_H

#define TESTCLASS_H

#include <cuda.h>

#ifdef GPU_COMPILATION

#ifndef GPU_DEVICE

	#define GPU_DEVICE __device__ __host__

#endif

#else

#ifndef GPU_DEVICE

	#define GPU_DEVICE

#endif

#endif

namespace math

{

class TestClass

{

private:

float myVar;

public:

TestClass(float var)

	: myVar(var)

{

}	



GPU_DEVICE float doSomething();

GPU_DEVICE float doSomethingInline()

{

	return myVar;

}

};

}

#endif[/codebox]

[codebox]#include “TestClass.h”

namespace math

{

float TestClass::doSomething()

{

return myVar;

}

}[/codebox]

I can use doSomethingInline() without any problems, but doSomething() produces the following error:

[codebox]warning: function “math::TestClass::doSomething” was referenced but not defined

Error: External calls are not supported (found non-inlined call to _ZN4math9TestClass11doSomethingEv)[/codebox]

Of course, I can only write inline-methods. But there are already a lot of complete classes in my project, splitted in *.cpp and .h.

Can anyone tell me how i can use classes (without inheritance, …) in the CUDA-kernel? :-)

Ha, I’m surprised inline methods work :)

CUDA supports only the C subset of C++, classes are technically illegal in kernels. Templates are the only C+±ish part that’s allowed in device code.

Ok, substitute “struct” for “class” in the OP’s code.

Does anyone know how to go about doing this? I have the exact same question, as I have a number of classes that I can easily convert to structs, but they are defined in sets of .h files and .cpp files. I am unable to get it to work with the struct definition in a separate .cu file from the struct declaration (when only including the header file). Thanks to anyone who can shed some light on this,

-Jeff

My simple but effective trick: Include the .cpp at the end of the header via preprocessor.

[codebox]#ifndef TESTCLASS_H

define TESTCLASS_H

include “…/gpu/GPUDefines.h”

namespace math

{

class TestClass

{

private:

float myVar;

public:

TestClass(float var)

	: myVar(var)

{

}	



GPU_HOST_AND_DEVICE float doSomething();

GPU_HOST_AND_DEVICE float doSomethingInline()

{

	return myVar;

}

};

}

ifdef COMPILE_FOR_GPU

include “TestClass.cpp”

endif

endif[/codebox]

[codebox]#ifndef COMPILE_FOR_GPU

#include "TestClass.h"

endif

namespace math

{

float TestClass::doSomething()

{

return myVar * 10;

}

}[/codebox]

[codebox]#ifndef GPUDEFINES_H

define GPUDEFINES_H

ifdef COMPILE_FOR_GPU

include <cuda.h>

ifndef GPU_DEVICE

define GPU_DEVICE device

endif

ifndef GPU_HOST

define GPU_HOST host

endif

ifndef GPU_HOST_AND_DEVICE

define GPU_HOST_AND_DEVICE device host

endif

else

ifndef GPU_DEVICE

define GPU_DEVICE

endif

ifndef GPU_HOST

define GPU_HOST

endif

ifndef GPU_HOST_AND_DEVICE

define GPU_HOST_AND_DEVICE

endif

endif

endif[/codebox]

Works perfect with our classes. All things, that are not cuda-compatible (functions using the stl, …) can be ‘removed’ for nvcc with #ifndef COMPILE_FOR_GPU. All code can be used for CPU and in CUDA.

Unfortunately, like most “features” of CUDA, this is not the answer I was hoping for. Including the *.cu files using the pre-processor is what I have currently implemented. This technique has the unfortunate side-effect of preventing incremental builds, but at least it works. Thanks,

-Jeff

Hi Jeff and others,

Recently, I also ran into the same problem. I have many structs and standalone functions that I like to compile separately and then link to the kernel, but I was getting the “External calls are not supported” error while compiling (not linking) the kernel. It’s very frustrating!! If you have figured out a way to achieve incremental compilation, please share.

Thanks

Nope, no other solution from my end… although it sounds like the new Fermi architecture being released later this month may solve many of the coding problems developers faced in the past when using CUDA, since it allegedly supports the full C++ standard (from what I have heard… I leave it to you to find a reliable source for this information.)