nvcc vs. gcc compilation

Hi all-

quick question, but first a little background…I’ve been using CUDA to speed up some data reduction codes for my institution. When I first started, to get things up and running quickly, I had all my source in one large .cu file. As the project grew a little larger, I saved separate sources belonging to the same project as .cu files so everything could be compiled with nvcc in one go. The project is now fairly sizable, so I was wondering if there is any performance benefit to now splitting up the code into .cu for host(s) and kernel(s), and .c files for everything else (i.e. the proper way to do things, apparently). Since my all-.cu code is running nicely, is there any reason to worry about this? Thanks in advance for any advice!

two methods.

method 1: use a warpper to encapsulate a kernel in .cu file, like

__global__ foo()

{

  ....

}

void foo_wrapper()

{

	foo<<< grid, block>>>( ); 

}

then put the host code to .cpp file, and call wrapper function as usual

int main()

{

	// CUDA initiliation

	foo_wrapper();

}

method 2: put only kernel into .cu file and compile .cu file into .ptx or .cubin, then

load .ptx or .cubin into your host code via driver API. In this method, you do two steps.

stpe 1: use nvcc to compile all .cu file into .ptx or .cubin

step 2: compile all host code (.c, .cpp) via gcc or other C/C++ compiler, you don’t need

nvcc in this step.

How to use drivere API: please read example SDK/matrixMulDrv

Why do we need a wrapper function at all? Can’t we simply call a kernel function stored in myKernel.cu into my main myCpp.cpp using extern?

Content of myKernel.cu

extern __global__ foo()

{

  ....

}

Content of myCpp.cpp

[code]int main()

{

	// CUDA initiliation

	foo<<<  ,,, >>>();

}

What is wrong with this?

calling kernels with the caret notation <<< … >>> is not valid in C++. The C++ compiler will probably think that you are trying to do something funky with templates…

Also, none of the CUDA function specifiers global, device, etc are valid in C++.