Nesting kernels Can I do this in CUDA?

cudacuda2009 · January 3, 2010, 8:40am

Hi,

I want to write a CUDA code as follows:

#include<xyz.h>

Device_function1(j,o,p);

{

 abdcd; 

kkj

}

Device_function2(v,j,k);

{

jjh akk;

}

Device_function3(a, d, g);;

{

jhakhgg;

}

Device_function4();

{

//Calling Device_function1, Device_function2, Device_function3

Device_function1<< w, r , j >>> (a, b, g);

Device_function2<< n, g , j >>> (a, d, e);

Device_function3<< n, m , j >>> (a, dew a);

}

main()

{

//calling kernel Device_function4();

Device_function4<<< n, m , j >>> (a, d, g);

}

Note that Device_function4 calls Device_function1, Device_function2, Device_function3.

Can we do like this in CUDA? If not then any other way for nesting functions?

Thanks for your time guys!!

avidday · January 3, 2010, 9:02am

No you can’t.

jma · January 3, 2010, 9:04am

function 1, 2 and 3 are never called from the host and therefore not really kernels, so if you just forget about the kernel syntax they will work just fine as normal function calls from kernel/function 4

Alternatively, make function4 a normal function on the host, calling three kernels in a row.

cudacuda2009 · January 4, 2010, 1:58pm

Thanks for your answers guys!

Well, I have seen a code like this:

__device__ void Add()

{

//do some calculation

}

__global__ void scanModified()

{

//do some calculations and call the device function Add();

Add();  //NOTICE THAT WE DO NOT MENTION THE EXECUTION CONFIGURATION <<<  >>> etc

}

NOTICE THAT WE DO NOT MENTION THE EXECUTION CONFIGURATION. CAN WE DO THIS IN CUDA?

avidday · January 4, 2010, 2:16pm

Yes. But notice that there is a big difference between nesting kernel launches (your original question and which cannot current be done), and nested device functions (your last post). You should also be aware that all device functions are presently expanded inline by the compiler and deeply nested device functions can have a pretty profound negative effect on register usage and compilation speed. Also, because there is no stack, recursive device function calls probably won’t work (although that is a guess, I don’t recall having tried it).

cudacuda2009 · January 4, 2010, 2:24pm

Do I need to explicitly mention that the function is inline or it is by default done inline by the compiler? Please also explain “Also, because there is no stack…” which stack are you talking about and where it is absent?

cudacuda2009 · January 4, 2010, 2:28pm

I want to ask one more think: why don’t we need execution configuration (no of threads, number of block etc) for the function Add() ?

avidday · January 4, 2010, 2:36pm

Read this.

Did you actually read any of the replies in this thread? Device functions aren’t kernels. They are even real subroutines. They are expanded inline (like macros) by the compiler inside kernel code.

cudacuda2009 · January 4, 2010, 2:41pm

Thanks a lot Avidday once again. I read the threads above. Could you please refer to me some document or previous forum post where I can read more about this (“Device functions aren’t kernels”…etc).

avidday · January 4, 2010, 2:46pm

You could start with the programming guide. Appendix B describes the properties of device functions pretty thoroughly.

cudacuda2009 · January 4, 2010, 2:48pm

Thanks Avidday, I am going to read Appendix B.

jma · January 4, 2010, 3:42pm

No, you allways mention the configuration at launch of the kernel function. Any subsequent function calls from within the kernel will be with that configuration.

Topic		Replies	Views
a kernel call within another kernel CUDA Programming and Performance	16	11967	January 23, 2018
CUDA functions How should CUDA functions be called? CUDA Programming and Performance	7	5663	August 13, 2009
What about calling non __device__ function inside kernel? Feature suggestion CUDA Programming and Performance	1	7890	June 3, 2011
Call kernel from kernel CUDA Programming and Performance	4	5850	July 19, 2007
NESTED __device__ calls CUDA Programming and Performance	0	1735	September 23, 2008
Inline Support CUDA Programming and Performance	2	4656	February 21, 2007
Calling device function returns "calling a host function..." error CUDA Programming and Performance	6	6592	August 23, 2010
ERROR: EXTERNAL CALLS NOT SUPPORTED CUDA Programming and Performance	20	76112	June 24, 2012
Functions CUDA Programming and Performance	1	4687	August 30, 2007
calling a __device__ functions inside kernels CUDA Programming and Performance	4	21276	August 16, 2013

Nesting kernels Can I do this in CUDA?

Related topics