kernel template

Essviv · May 9, 2013, 1:30am

I’m new to cuda programming,^-^. i wrote a template kernel function and when i called it in my main.cpp, i encountered a undefined error. My code is quite simple and as follows:

////////////////////////////////////////////////////////////////////////
//kernel.cu
template
global void addKernel(const Ta,const Tb,int n,T* result)
{
int nIndexX=threadIdx.x+blockIdx.xblockDim.x;
int nIndexY=threadIdx.y+blockIdx.yblockDim.y;
int nIndex=nIndexYblockDim.xgridDim.x+nIndexX;
/int nIndex=threadIdx.x;/
if(nIndex<n)
result[nIndex]=a[nIndex]+b[nIndex];
}

template
void addByGPU(const Ta,const Tb,int n,T* result)
{
if(a==NULL || b==NULL || result==NULL)
return;

cout<<"GPU execution starts."<<endl;

int nSize=sizeof(T)*n;
cudaError_t status;

T* nA_d,*nB_d,*nC_d;
status=cudaMalloc(&nA_d,nSize);
status=cudaMalloc(&nB_d,nSize);
status=cudaMalloc(&nC_d,nSize);

status=cudaMemcpy(nA_d,a,nSize,cudaMemcpyHostToDevice);
status=cudaMemcpy(nB_d,b,nSize,cudaMemcpyHostToDevice);

dim3 blocksPerGrid(1024/16,1024/16,1);
dim3 threadsPerBlock(16,16,1);
addKernel<T><<<blocksPerGrid,threadsPerBlock>>>(nA_d,nB_d,n,nC_d);
status=cudaMemcpy(result,nC_d,nSize,cudaMemcpyDeviceToHost);
status=cudaDeviceSynchronize();

status=cudaFree(nA_d);
status=cudaFree(nB_d);
status=cudaFree(nC_d);

status = cudaDeviceReset();
return;

}

//main.cpp
#include <stdlib.h>
#include <time.h>
#include
using namespace std;

//template extern void addByGPU(const Ta,const Tb,int n,T* result);
//extern void addByGPU(const inta,const intb,int n,int* result);

int main(void)
{
// MathOp mathOp;

int nCount=4*4*256*256;
int* a=new int[nCount];
int* b=new int[nCount];

srand(time(0));
for(int i=0;i<nCount;i++)
{
	a[i]=rand();
	b[i]=rand();
}

int*c=new int[nCount];
addByGPU<int>(a,b,nCount,c);
//int* c=mathOp.add(a,b,nCount);
for(int i=0;i<nCount;i++)
{
	if(a[i]+b[i]!=c[i])
	{
		cout<<"Error!"<<endl;
		break;
	}
}

cout<<"Completed..."<<endl;

delete [] a;
delete [] b;
delete [] c;
return 0;

}
////////////////////////////////////////////////////////////////

i tried to add the line “extern ****” at the beginning of the main.cpp file, but it still doesn’t work.
it turns out that the declaration and the definition of the template function in c++ must in the same file so the compiler can determine the exact type while compiling. i figure it out that if i move the main function into the cu file, then everything will be fine. but i wanna know if any alternative exists avoiding doing that. it makes the code structure so wired. Any suggestion would be appreciated.

Tiomat · May 9, 2013, 3:06pm

It is a general c++ issue that templates need to be completely visible to the compiler when it hits any call off to a template function. The way that I get around it is I put each of my templated kernels in a .cuh file, and simply #include that file at the top of each cpp file that requires them (or .cu files too). This way keeps kernels out of ‘normal’ c++ code as well as allowing each individual file to stay relatively small.

So in your example you would move your kernel into a seperate file called perhaps “AddKernel.cuh” (fully include guarded of course), and then at the top of your cpp file add #include “AddKernel.cuh”.

Mati86 · July 6, 2013, 3:52pm

In Device Query of Ge Force GT 640 …

Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535

does it mean i can launch a grid with size 2147483647 x 65535 x 65535 …

can anyone answer

Topic		Replies	Views
Template function calling a kernel with separated files architecture The normal function works, the CUDA Programming and Performance	2	1680	December 17, 2009
Noob Q: How to extern c function? CUDA Programming and Performance	19	23564	June 30, 2010
cuda in template header CUDA Programming and Performance	5	2187	November 17, 2016
Linking CUDA and C++ CUDA Programming and Performance	0	2670	May 23, 2011
Error Compiling CUDA CUDA Programming and Performance	3	5198	April 30, 2010
C++ Integration CUDA Programming and Performance	7	1692	June 23, 2009
cuda and C++ CUDA Programming and Performance	4	2135	September 5, 2010
new CUDA 4.0 Driver API Kernel launch call How does it work with templetized kernels? CUDA Programming and Performance	6	6453	June 18, 2011
template cuda kernel function cannot be called in another template function on vs2013 CUDA Programming and Performance	7	1686	October 30, 2015
A problem with template and kernel call Compilation fails in this case CUDA Programming and Performance	2	911	April 22, 2010

kernel template

Related topics