How to calculate the exact gain after certain optimization?

I’ve run nsight compute on my kernel. And I can see a large warp stall from “Stall Long Scoreboard“ and I knew how to optimize it.

However, before optimizing, is there a metric on nsight compute that it can tell me how much gain I can get after optimizing all the stalls?

For example, if I have 13.7 cycles per instruction stall long scoreboard, after optimizing it, how much gain I can get? Is there a theoretical way to get that?

Hi, @quan.luo.101

Thanks for using Nsight Compute.
Please check if 2. Profiling Guide — NsightCompute 13.0 documentation can help.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.