It seems that when I use gpu__time_active.avg with/without other metrics, the kernel runtime changes slightly. Is that normal? If it is normal that with more metrics, the kernel runtime increases, then I have to isolate this metric and use it only for measuring kernel run time. Any comment on that?
Kernel runtime should not change but process runtime will increase as you add more counters. Pre-launch overhead increases based upon the number of GPU performance monitor units that need to be enabled. Post-complete overhead increases by the number of metrics that need to be calculated. If the number of counters exceeds what can be collected in one pass then additional replays will have to be performed which can significantly increase the process runtime.