Should I combine multiple kernels? or: How bad is kernel call overhead?

How bad is kernel call overhead? I have a set of kernels that need to happen, in a specific order, since one feeds into the next. Some are major, and require a fairly decent amount of time to execute, and others are minor, such as searches, which prepare data to be handled by the larger kernels.

Would I be suffering a great deal by calling these kernels one by one from a host function? Would it be a better approach to refactor all these kernels into one kernel to rule them all?

Kernel call overhead is approximately 10 microseconds. Generally, it is not worth the headaches to create one uber kernel.