a kernel call within another kernel

hai…
i am studying through the cuda programming guide…a doubgt now…

functions that need to be run the gpu device need to be declared as

with global or device

suppose i have a global function…can i make another function call within it…should i give it ( the inner function call) a global or device

can you help me clear my doubt…

All function calls from cuda functions are inlined, so no recursions are possible. Also you can not start parallel kernels from a kernel. Because each thread executes the code serial.

suppose i run a kernel which executes well and produces a resultant data…without copying that resultant data to host can another kernel launch access that previous result value…

or i have to memcpy to host and then again to the device …

can you help me around…

oops sorry …there seems to a function cudaMemcpyDeviceToDevice which should be doing the above…

If I get your question right, the answer is yes. If both kernels share same context you can write data to memory from first kernel and then read it from second kernel. This holds for global (device) memory, not for shared or local memory.

You will probably need to cudaMalloc() storage on GPU from host code and pass pointers to allocated device memory to both kernels.

I was thinking if i could simply run my cprogram in a barbaric way on the GPU

is this possible…

device
{

device
{

device
{

}
}

}

No…again, recursion is not possible. You cannot call another kernel from within a kernel.

Uh, why do you think this is recursion? As long as it can be completely inlined (e.g. your functions are just “syntactic sugar”) you can call another function, if you could not, device functions would be completely useless.

sorry i could have made it clear…in my case they are not recursive functions ,each one is a separate device function

device funA()
{

device funB()
{

device funC()
{

}
}
}

I dont have anything to say except that the code really looks barbaric… :-)

Jokes apart, I dont think you can do all that… but i really dont know.

you can call device functions from device functions, which get called from global functions.

I love nested functions. Why’d C++ ever take them out?

Maybe I’m missing something here, but what is the point of nesting the functions? Anything that can be written as nested functions should be able to be written serially as one device function as well… or no?

I dont know for sure. but my guess is “scoping”. A nested function can be called only withing the scope of the function that nests it… May b…

did you note that exampleof OceanFFT kernel…there device functions are passed as arguments to another device function…

Nested functions are nice. They help me tidy my code.

You mean function pointers? That’s impossible… ?

I believe this may answer your questions.

http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf

/Mike