Profiling multiple GPU executables simultaneously

Hello.

I am currently trying to measure the L2 cache hit rate on Nvidia AGX Xavier by dividing the case of running multiple GPU executables simultaneously and the case of running one GPU executable.

When profiling a single GPU executable, I was able to do it without problems using nvprof.

However, since only one executable can be specified with nvprof at a time, when processing the target executable while other GPU executables are running, the performance of other executables deteriorated due to profiling overhead.

Is there a way to profile multiple GPU executables at the same time?

Thank you.

I suppose you can run each of the GPU executables with nvprof.