Discrepancy between throughput and roofline chart?

In the photo we see the roofline and throughput chart for the same kernel. We see a large discrepancy between reported values between the two charts. As I’m only achieving 40% throughput according to the throughput chart I’m confused as to why it’s a lot closer to peak compute in the roofline chart. Are they not measuring the same thing?

This is for fp64 btw

The roofline plot uses logarithmic scale for its axis.

The SOL throughput and roofline charts don’t measure precisely the same things. One shows you the most utilized compute pipeline (as indicated in the Breakdown tables of the same section), across all units. The other shows you performance throughput for a specific type of operation (in this case, double-precision floating point).

Right, so in my case the lsu pipe is the most active and this unit handles load, store, and other memory instructions. Is the compute throughput chart not actually about “compute” then since the lsu is the most active pipeline?

And for actual double precision performance (i.e. flops/s) the roofline chart is more indicative?

All pipelines are considered “compute” pipelines in the context of Nsight Compute. Most pipelines, including LSU, serve a variety of purposes and instructions. You can find more info here. Certainly, if your definition encompasses only math instructions, then this would not align, but this is also where the roofline charts come into play.

For double precision pipeline utilization, you would want to check the fp64 pipeline, which executes FP64 math and type conversion instructions. For fp64 hardware flop/s, you would use the respective roofline data, correct.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.