Hello,
I have a doubt about some metrics that can be collected with pgprof. I want to know if the following equation holds:
inst_executed =
inst_executed_global_loads+inst_executed_local_loads+
inst_executed_shared_loads+inst_executed_surface_loads+
inst_executed_global_stores+inst_executed_local_stores+
inst_executed_shared_stores+inst_executed_surface_stores
Besides, how many bytes are moved by each instruction accumulated in inst_executed? 4 bytes?
Thanks
Hi Henrique,
Sorry for the late reply, I needed to check with the profiling team.
I want to know if the following equation holds
No, “inst_executed” may include additional loads not included in the others listed.
Besides, how many bytes are moved by each instruction accumulated in inst_executed? 4 bytes?
Not sure, nor am I sure that you directly translate the number of instruction bytes moved from the inst_executed metric given instructions are executed at the warp level.
If you’re using a CC 7.0 or greater device (Volta), you may consider moving to Nisght-Compute which is the successor to nvprof/pgprof when using metrics. Nsight-systems is the replacement when using the timeline.
https://docs.nvidia.com/nsight-compute/index.html#nsight-compute
-Mat