My requirement is to analyze my game frames to locate the bottleneck draw calls and the compute-intense shaders in those draw calls. Here is my plan.
- locate the bottleneck draw calls by looking the SM throughput counter:
sm__throughput.avg.pct_of_peak_sustained_elapsed (%) - locate the compute-intense shader stages by looking at the throughput of each shader stage:
[Counter group 1]
sm__cycles_active_shader_cs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_gs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_ps.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_tcs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_tes.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_vs.avg.pct_of_peak_sustained_elapsed (%)
[Counter group 2]
sm__warps_active.sum
sm__warps_active_shader_vtg.sum
sm__warps_active_shader_ps.sum
sm__warps_active_shader_cs.sum
Does this plan sound reasonable?
By looking at my profiling results, I have the following two questions.
- What is the meaning of the following counters?
sm__cycles_active_shader_cs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_gs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_ps.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_tcs.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_tes.avg.pct_of_peak_sustained_elapsed (%)
sm__cycles_active_shader_vs.avg.pct_of_peak_sustained_elapsed (%)
I assume
sm__cycles_active_shader_xx.avg.pct_of_peak_sustained_elapsed (%) =
sm__cycles_active_shader_xx.avg / sm__cycles_elapsed_shader_xx.avg. So the value should be less than 100(< 100%). However, in one of the profiling results I am investigating sm__cycles_active_shader_ps.avg.pct_of_peak_sustained_elapsed (%) is more larger than 100.
sm__cycles_active.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_cs.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_gs.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_ps.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_tcs.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_tes.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_active_shader_vs.avg.pct_of_peak_sustained_elapsed (%) | sm__cycles_elapsed.avg.pct_of_peak_sustained_elapsed (%) |
---|---|---|---|---|---|---|---|
99.12263 | 99.18941 | 0 | 0 | 0 | 0 | 0 | 100 |
96.12748 | 96.24648 | 0 | 0 | 0 | 0 | 0 | 100 |
96.62175 | 96.69736 | 0 | 0 | 0 | 0 | 0 | 100 |
95.00545 | 95.65731 | 0 | 0 | 0 | 0 | 0 | 100 |
94.49121 | 94.46158 | 0 | 0 | 0 | 0 | 0 | 100 |
95.29343 | 0 | 0 | 103.65118 | 0 | 0 | 0.01094 | 100 |
95.29874 | 95.22861 | 0 | 0 | 0 | 0 | 0 | 100 |
97.74141 | 97.69843 | 0 | 0 | 0 | 0 | 0 | 100 |
79.56786 | 0 | 0 | 173.34756 | 0 | 0 | 0.53782 | 100 |
75.76738 | 0 | 0 | 162.66343 | 0 | 0 | 0.72478 | 100 |
55.19795 | 0 | 0 | 0 | 0 | 0 | 54.24617 | 100 |
53.87546 | 0 | 0 | 0 | 0 | 0 | 54.24021 | 100 |
78.82896 | 0 | 0 | 54.93192 | 0 | 0 | 54.98424 | 100 |
83.45561 | 81.61367 | 0 | 0 | 0 | 0 | 0 | 100 |
58.72605 | 0 | 0 | 21.3328 | 0 | 0 | 45.36058 | 100 |
56.48805 | 0 | 0 | 46.95401 | 0 | 0 | 40.3861 | 100 |
96.22414 | 0 | 0 | 98.76607 | 0 | 0 | 0.00389 | 100 |
- What is the relationship among the following counters?
sm__warps_active.sum
sm__warps_active_shader_vtg.sum
sm__warps_active_shader_ps.sum
sm__warps_active_shader_cs.sum
I assume sm__warps_active.sum >= sm__warps_active_shader_vtg.sum + sm__warps_active_shader_ps.sum + sm__warps_active_shader_cs.sum. However, this doesn’t seem to be the case according to my data. In addition, why we don’t have these counters sm__warps_active_shader_vs.sum, sm__warps_active_shader_tcs.sum, sm__warps_active_shader_tes.sum and sm__warps_active_shader_gs.sum?