Measuring individual Kernel execution time for Concurrent Kernels on Fermi cards

I am trying to run concurrent kernels on Fermi card (new Feature in 2.0 and above). I would like to measure the time taken for each of those concurrent kernels. Is there any reliable way to do that?

I am trying to run concurrent kernels on Fermi card (new Feature in 2.0 and above). I would like to measure the time taken for each of those concurrent kernels. Is there any reliable way to do that?