Accessing CUPTI Counters through a daemon process

I am currently working on a project that involves monitoring multiple GPUs using CUPTI (CUDA Profiling Tools Interface). Specifically, I am using the example provided in the CUPTI documentation, where each context of a GPU is pushed and popped before and after a kernel call, respectively. This implementation allows me to monitor the kernels using a pthread.

However, I am facing a challenge in replacing the pthread with a separate process to monitor the GPUs. The main issue arises from the fact that I cannot modify the code of the kernel program to manually push and pop the contexts. As a result, CUPTI does not read the values of counters outside the context of the monitored kernels.

My question is whether there is a way to access the CUPTI counters for all programs running on a GPU, rather than being limited to counters specific to a context. I am looking for a solution that would allow me to access CUPTI counters across multiple GPU kernels without requiring knowledge about the specific kernels running on each GPU.

