QUICK QUESTION: Launching a kernel inside a kernel


i would like to know if it is possible to launch a kernel inside a kernel…I guess it is not possible, but maybe there is a workaround strategy to do it.

My application is eligible of differnt granularity of parallelism. I have a main task, say A, that can be subdivided into N sub-tasks independent that can be in their turn subdived in M sub-tasks independet. Hence, I could call a first kernel A that is computed by menas of N threads for instace. Each thread then can launch a new kernel computed by M threads. Is it feasible?

Please, help me.

Thank you,


No, kernels cannot launch other kernels. Can you map this onto a launch of N blocks, each with M threads per block?

no, you cannot invoke a kernel inside another kernel.

you must shedule your tree-like parallel structure, for example, you can use CPU to maintain

the tree and dispatch your task to GPU (multi-GPU).

Thanks for your reply.

No unfortunately not, I cannot. The problem is that I have a computation that can be assimilated to a tree. I mean I have for instance (but the problem is more complicated) a first summation of M quantities and each term of the sum can be evaluated independently. Similarly, each term of the outer summation is composed by N terms that can be evaluated independently.

Now, you would suggest to evaluate parallely NxM basic elements of the double summation, but I already have a black box CUDA code that performs the N terms evaluation parallely and I want to reuse it.

I do not think there is a workaround solution that allows me to reuse it…and parallelize all the computation.

I should rewrite the code in order to manage the entire computation and cannot exploit different levels of parallelisation, but this takes much time…

Or obviously i can use serial code for a M-cycle serial computation invoking the parallel code for each cycle.

Any differen suggestions, please?