Controller based kernels

Hi,

I am trying to do the following. My wish is to have 2 kernels A and B running at the same time while a third kernel C controls their advancement. It is some kind of search algorithm.

There seems to be a dynamic parallelism concept in CUDA 5.0 for compute capabilities 3.5 which seemed appropriate but I have 3.0 only.

I would appreciate if anyone had ideas, example piece of code that I could work on to find a solution to my problem.

Thank you in advance.