Hello,
I am interested in using the new CUDA Profiling API to collect HW performance counters from a multi-process workload. In particular, these are separate child processes (owned by the same UNIX userid), which as a result each have their own CUDA context. When I attempt to collect performance counters in a parent process (GPU occupancy in my case), I notice it is zero. This seems to indicate that collected performance counters belong only to the process I’m collecting from (in my case the parent process doesn’t run anything itself, only the children do).
I noticed that there is a boolean property called isolated
on the NVPA_RawMetricRequest
object which is used when configuring which metrics to collect. I thought that perhaps I could collect HW counters “across processes” by setting this to false
(it’s always set to true
in the userrange_profiling
sample program). However, when I do this I get an error:
function NVPW_RawMetricsConfig_AddMetrics(&addMetricsParams) failed with error (1) NVPA_STATUS_ERROR: Generic error.
Unfortunately, the CUPTI documentation (seen here under Configuration Workflow) doesn’t describe what NVPA_RawMetricRequest.isolated
does, and only documents an example where it is always set to true
.
So my questions are:
- Is it possible to collect HW counters across multiple processes from a parent process? Or, do I need to simply collect HW counters separately within each child process?
- What exactly does the
NVPA_RawMetricRequest.isolated
attribute do in the call toNVPW_CounterDataBuilder_AddMetrics(...)
? Am I using it incorrectly?
Thanks in advance!