nvcc vs. gcc compilation

sundog314 · March 22, 2010, 7:24pm

Hi all-

quick question, but first a little background…I’ve been using CUDA to speed up some data reduction codes for my institution. When I first started, to get things up and running quickly, I had all my source in one large .cu file. As the project grew a little larger, I saved separate sources belonging to the same project as .cu files so everything could be compiled with nvcc in one go. The project is now fairly sizable, so I was wondering if there is any performance benefit to now splitting up the code into .cu for host(s) and kernel(s), and .c files for everything else (i.e. the proper way to do things, apparently). Since my all-.cu code is running nicely, is there any reason to worry about this? Thanks in advance for any advice!

LSChien · March 23, 2010, 12:41am

two methods.

method 1: use a warpper to encapsulate a kernel in .cu file, like

__global__ foo()

{

  ....

}

void foo_wrapper()

{

	foo<<< grid, block>>>( ); 

}

then put the host code to .cpp file, and call wrapper function as usual

int main()

{

	// CUDA initiliation

	foo_wrapper();

}

method 2: put only kernel into .cu file and compile .cu file into .ptx or .cubin, then

load .ptx or .cubin into your host code via driver API. In this method, you do two steps.

stpe 1: use nvcc to compile all .cu file into .ptx or .cubin

step 2: compile all host code (.c, .cpp) via gcc or other C/C++ compiler, you don’t need

nvcc in this step.

How to use drivere API: please read example SDK/matrixMulDrv

gpuguy · March 23, 2010, 2:28am

two methods.

method 1: use a warpper to encapsulate a kernel in .cu file, like
__global__ foo()

{

  ....

}

void foo_wrapper()

{

	foo<<< grid, block>>>( ); 

}
then put the host code to .cpp file, and call wrapper function as usual
int main()

{

	// CUDA initiliation

	foo_wrapper();

}

Why do we need a wrapper function at all? Can’t we simply call a kernel function stored in myKernel.cu into my main myCpp.cpp using extern?

Content of myKernel.cu

extern __global__ foo()

{

  ....

}

Content of myCpp.cpp

[code]int main()

{

	// CUDA initiliation

	foo<<<  ,,, >>>();

}

What is wrong with this?

Gregory_Diamos · March 23, 2010, 2:51am

calling kernels with the caret notation <<< … >>> is not valid in C++. The C++ compiler will probably think that you are trying to do something funky with templates…

Also, none of the CUDA function specifiers global, device, etc are valid in C++.

Topic		Replies	Views
How to call a CUDA kernel from a C project? (not C++) CUDA Programming and Performance	2	3928	September 9, 2015
Compiling CUDA and CPP files in Linux CUDA Programming and Performance	6	11133	October 5, 2010
Compile .cu like .cpp CUDA Programming and Performance	7	3052	October 14, 2016
Include CUDA in C++ project CUDA Programming and Performance	5	29343	January 28, 2008
Intel compiler support for front-end CUDA compilation CUDA Programming and Performance	11	15405	January 27, 2010
Is it required that an entire .cu file must use syntax below C++17? CUDA Programming and Performance cuda	10	1001	December 29, 2022
Compiling C and CUDA code Problems linking CUDA code and C code CUDA Programming and Performance	7	19097	November 4, 2011
general use of nvcc CUDA Programming and Performance	8	2677	November 7, 2009
question about calling CUDA kernels using a class CUDA Programming and Performance	5	14575	July 12, 2010
cuda compilation question CUDA Programming and Performance	3	4635	May 10, 2007

nvcc vs. gcc compilation

Related topics