Accessing CUPTI Counters through a daemon process

Dear Nvidia Community,

I hope this message finds you well.

I am currently working on a project that involves monitoring multiple GPUs using CUPTI (CUDA Profiling Tools Interface). Specifically, I am using the example provided in the CUPTI documentation, where each context of a GPU is pushed and popped before and after a kernel call, respectively. This implementation allows me to monitor the kernels using a pthread.

However, I am facing a challenge in replacing the pthread with a separate process to monitor the GPUs. The main issue arises from the fact that I cannot modify the code of the kernel program to manually push and pop the contexts. As a result, CUPTI does not read the values of counters outside the context of the monitored kernels.

My question is whether there is a way to access the CUPTI counters for all programs running on a GPU, rather than being limited to counters specific to a context. I am looking for a solution that would allow me to access CUPTI counters across multiple GPU kernels without requiring knowledge about the specific kernels running on each GPU.

Any insights, suggestions, or alternative approaches to achieve this goal would be highly appreciated.

Thank you very much for your assistance.

Best regards,
Varun Parashar