Reading performance counters for concurrent kernels

Hi,

I’m trying to read the performance counters of two kernels running concurrently.
I read in the documentation that performance counters can only be read as aggregated. Is planned to provide some mechanism to read the performance counter of individual kernels?

Hi,

Yes that’s the limitation with the performance counter profiling. You either have to serialize the kernels to get the per kernel counters, or aggregated values are given when multiple kernels are running concurrently. There is no plan to provide the isolated value for individual kernels.