Unresolved externals in CUDA expression template library under Visual Studio 2010

I’m trying to implement an expression template library in CUDA language.

I have a CudaMatrix.h file, included in the main program, that contains:

template <typename OutType>
 class CudaMatrix
 {

      // some stuff

      template <class A,class T>
      const CudaMatrix<OutType> & operator=(CudaExpr<A,T> e)
      {   
           evaluation_function(data_,e,GetNumElements());
           return *this;
      }

 };

I have also a CudaMatrix.cu file that contains:

#include "CudaExpression.cuh" // Contains the unary and binary operation classes

template <class A, class T1, class T2>
__global__ inline void evaluation(T1 *data_, const CudaExpr<A,T2> e, int NumElements)
{
    const int i = blockDim.x * blockIdx.x + threadIdx.x;
    if(i < NumElements) data_[i] = e[i];
}

template<class A, class T1, class T2>
inline void eval_wrap(T1 *data_, const CudaExpr<A,T2> e, int NumElements)
{
    dim3 dimGrid(iDivUp(NumElements,dimBlock.x));
    evaluation<<<dimGrid,dimBlock>>>(data_,ob,NumElements);
}

template<class A, class T1, class T2> 
void evaluation_function(T1* data, const CudaExpr<A,T2> e, int NumElements) 
{
    eval_wrap(data, e, NumElements); 
};

I have chosen this organization since I’m moving to the .cu file all the device code that must compiled by nvcc and elsewhere the host code that must be compiled by the Visual Studio 2010 compiler. When I compile, I unfortunately have the following problem:

Error   35  error LNK2019: unresolved external symbol "void __cdecl
evaluation_function<class CudaBinExpr<struct double2 const *,struct double2 const 
*,class CudaOpSum,struct double2>,struct double2,struct double2>(struct double2 *,class
CudaExpr<class CudaBinExpr<struct double2 const *,struct double2 const *,class
CudaOpSum,struct double2>,struct double2>,int)" (??$evaluation_function@V
$CudaBinExpr@PBUdouble2@@PBU1@VCudaOpSum@@U1@@@Udouble2@@U2@@@YAXPAUdouble2@@V
$CudaExpr@V?$CudaBinExpr@PBUdouble2@@PBU1@VCudaOpSum@@U1@@@Udouble2@@@@H@Z) referenced
in function "public: class CudaMatrix<struct double2> const & __thiscall
CudaMatrix<struct double2>::operator=<class CudaBinExpr<struct double2 const *,struct
double2 const *,class CudaOpSum,struct double2>,struct double2>(class CudaExpr<class
CudaBinExpr<struct double2 const *,struct double2 const *,class CudaOpSum,struct
double2>,struct double2>)" (??$?4V
$CudaBinExpr@PBUdouble2@@PBU1@VCudaOpSum@@U1@@@Udouble2@@@
$CudaMatrix@Udouble2@@@@QAEABV0@V?$CudaExpr@V
$CudaBinExpr@PBUdouble2@@PBU1@VCudaOpSum@@U1@@@Udouble2@@@@@Z)

Could anyone help me? Thank you very much in advance.

I haven’t looked at your code closely, but it seems your CudaMatrix class misses host device declarations on its member functions, so it would only be usable on the host, not the device.

Thanks Tera for your answer. Unfortunately, it does not solve the problem.
The idea indeed is to have a matrix class declared on the host, but with methods launching kernel functions.

Some more pieces of information

I currently have a working expression template library on CPU. The overloaded = operator is implemented, on the CPU, as follows:

template <typename A, typename B>
const Matrix<OutType> & operator = (Expr<A,B> e)
{   
    for (int i=0; i<GetNumElements(); i++) data_[i] = e[i];
    return *this;
}

When ported to the GPU, the for loop should be changed to a function launching a kernel using the syntax <<<…>>> (eval_wrap in my case). This should be contained in a .cu file to be intercepted and compiled by nvcc. My problem is how to “templetize” such a function, while avoiding explicit instances thereof.

I solved the problem. It was necessary to rename the main .cpp file to .cu and include all the templated methods and classes in the same .cu file (see also the simpleTemplate example of the SDK). I think that file renaming enabled to compile everything with nvcc while including everything in the same .cu file is an issue specific to the expression templates technique.