Collecting Events/Metrics at 2^n clock cycles


nvprof --pc-sampling-period 5 --print-gpu-trace --events tex0_cache_sector_queries --replay-mode application --export-profile ./matrixMul

I am using the above command while varying the number of cycles for pc sampling…say 2^5, 2^10 etc.

I am expecting to capture event count at ever 2^5, 2^10 cycles…of matrixMul execution

Unable to locate these values…

Appreciate any help


nvprof option “–pc-sampling-period” is applicable when pc sampling collection is requested explicitly. This option doesn’t have any impact on the collection of events and metrics.

In case you are interested in collecting PC sampling information, here is a sample command:

nvprof --source-level-analysis pc_sampling --pc-sampling-period 5 --export-profile ./matrixMul

Thank you for the explanation on pc-sampling-period …that helps

Additionally, if I want to obtain the GPU event count of various events say inst_issued0, inst_executed, local_load etc. at every ‘n’ milliseconds of execution of a CUDA program say ./transpose…then how to obtain ?

nvprof doesn’t support sampling of the events and metrics except for NVLink metrics. Command line option “–event-collection-mode” can be set to value “continuous” to enable the sampling of NVLink metrics. See profiler doc section Profiler :: CUDA Toolkit Documentation for more details.

On the other hand, CUPTI APIs support continuous mode for a larger set of events and metrics. CUPTI sample event_sampling shows how to use the event APIs to sample events using a separate host thread. Useful links:
Overview: CUPTI :: CUPTI Documentation
Samples: CUPTI :: CUPTI Documentation