Besides L1, L2, DRAM version of achieved values , does Nsight Compute roofline chart support a single achieved value point for a single CUDA kernel in a roofline chart? I was able to observe “achieved value” point in a roofline chart when I was targetting a single kernel (for instance, sgemm_fp32…) for LlaMA3.2 - 1B prompt stage, but when I increase the number of target kernels, achieved values stoppted showing up. Can anyone explain why?
There are many different roofline charts available in Nsight Compute. You seem to be referring to the hierarchical ones? There is also the GPU Speed Of Light Roofline Chart, which is more of an overview without separating the individual levels of the memory hierarchy, for single- and double-precision, respectively. Finally, there are dedicated rooflines for various tensor operations. Since 2025.1, all rooflines are part of the full set. There is, however, no single roofline that covers everything.
I am not quite sure about this part:
when I increase the number of target kernels, achieved values stoppted showing up
Can you clarify what you changed? Did you collect more individual kernels (so that you end up with one result per kernel), or did you specify a range across multiple kernels?
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.