How bad is kernel call overhead? I have a set of kernels that need to happen, in a specific order, since one feeds into the next. Some are major, and require a fairly decent amount of time to execute, and others are minor, such as searches, which prepare data to be handled by the larger kernels.
Would I be suffering a great deal by calling these kernels one by one from a host function? Would it be a better approach to refactor all these kernels into one kernel to rule them all?