I am interested in using the new CUDA Profiling API to collect HW performance counters from a multi-process workload. In particular, these are separate child processes (owned by the same UNIX userid), which as a result each have their own CUDA context. When I attempt to collect performance counters in a parent process (GPU occupancy in my case), I notice it is zero. This seems to indicate that collected performance counters belong only to the process I’m collecting from (in my case the parent process doesn’t run anything itself, only the children do).
I noticed that there is a boolean property called
isolated on the
NVPA_RawMetricRequest object which is used when configuring which metrics to collect. I thought that perhaps I could collect HW counters “across processes” by setting this to
false (it’s always set to
true in the
userrange_profiling sample program). However, when I do this I get an error:
function NVPW_RawMetricsConfig_AddMetrics(&addMetricsParams) failed with error (1) NVPA_STATUS_ERROR: Generic error.
Unfortunately, the CUPTI documentation (seen here under Configuration Workflow) doesn’t describe what
NVPA_RawMetricRequest.isolated does, and only documents an example where it is always set to
So my questions are:
- Is it possible to collect HW counters across multiple processes from a parent process? Or, do I need to simply collect HW counters separately within each child process?
- What exactly does the
NVPA_RawMetricRequest.isolatedattribute do in the call to
NVPW_CounterDataBuilder_AddMetrics(...)? Am I using it incorrectly?
Thanks in advance!