Multi-process profiling: "NVPA_RawMetricRequest.isolated=false"


I am interested in using the new CUDA Profiling API to collect HW performance counters from a multi-process workload. In particular, these are separate child processes (owned by the same UNIX userid), which as a result each have their own CUDA context. When I attempt to collect performance counters in a parent process (GPU occupancy in my case), I notice it is zero. This seems to indicate that collected performance counters belong only to the process I’m collecting from (in my case the parent process doesn’t run anything itself, only the children do).

I noticed that there is a boolean property called isolated on the NVPA_RawMetricRequest object which is used when configuring which metrics to collect. I thought that perhaps I could collect HW counters “across processes” by setting this to false (it’s always set to true in the userrange_profiling sample program). However, when I do this I get an error:

function NVPW_RawMetricsConfig_AddMetrics(&addMetricsParams) failed with error (1) NVPA_STATUS_ERROR: Generic error.

Unfortunately, the CUPTI documentation (seen here under Configuration Workflow) doesn’t describe what NVPA_RawMetricRequest.isolated does, and only documents an example where it is always set to true.

So my questions are:

  1. Is it possible to collect HW counters across multiple processes from a parent process? Or, do I need to simply collect HW counters separately within each child process?
  2. What exactly does the NVPA_RawMetricRequest.isolated attribute do in the call to NVPW_CounterDataBuilder_AddMetrics(...)? Am I using it incorrectly?

Thanks in advance!