metrics of pgprof

Hello,

I have a doubt about some metrics that can be collected with pgprof. I want to know if the following equation holds:

inst_executed =
inst_executed_global_loads+inst_executed_local_loads+
inst_executed_shared_loads+inst_executed_surface_loads+
inst_executed_global_stores+inst_executed_local_stores+
inst_executed_shared_stores+inst_executed_surface_stores

Besides, how many bytes are moved by each instruction accumulated in inst_executed? 4 bytes?

Thanks

Hi Henrique,

Sorry for the late reply, I needed to check with the profiling team.

I want to know if the following equation holds

No, “inst_executed” may include additional loads not included in the others listed.

Besides, how many bytes are moved by each instruction accumulated in inst_executed? 4 bytes?

Not sure, nor am I sure that you directly translate the number of instruction bytes moved from the inst_executed metric given instructions are executed at the warp level.

If you’re using a CC 7.0 or greater device (Volta), you may consider moving to Nisght-Compute which is the successor to nvprof/pgprof when using metrics. Nsight-systems is the replacement when using the timeline.


https://docs.nvidia.com/nsight-compute/index.html#nsight-compute

-Mat