Why can’t my nsight compute see Floating Point Operations Roofline (Tensor Core)
The machine I use is V100
The version I used for profile is 2023.1.0
What I use for GUI is 2024.3.1
Have you enabled the roofline set to generate the report ?
Can you tell me how to turn on this option please?
If you are using GUI to profile, you should make sure below option are selected or choose “full”.
If you are using CLI to profile, you can add option --set roofline
I’ve already used the --set full option
I noticed you are using different version of CLI and GUI, can you try to update the CLI version ? 2023.1.0 is relatively old.
https://forums.developer.nvidia.com/uploads/short-url/mQm63wwqMuXuTDqC2aSoLlMtUf2.zip
I tried to use someone else’s file in my GUI and still didn’t see tensor core roofline
The following is mine
mamba2_ge.zip (1.0 MB)
As I said before, please use newer CLI tools to collect the report.
I tried to use --set rooline and now I can see the tensor core roofline
But in my section option roofline evaluates a lot more metrics than full
Hi, @1144974269
Good to know you see the roofline now ?
What’s the remained problem now ? Can you clarify ?
Full = full report
Full != All metrics
The roofline sections require a lot of passes. In Metric Selection you can switch from Metric Sets to Metric Sections/Rules to pick individual sections. The “Sets” column will show which sections are in the full or detailed set. In your release the GPU Speed of Light Hierarchical Roofline Charts (* Precision) are only in the roofline set. Through this dialog you can include on the rooflines of interest which will reduce the required metrics.
thanks!so if i want all sections including roofline and others,I just need to use --set full ! ?
The best method is to run the commands
- ncu --list-sets
- ncu --list-sections
In 2024.3.0 full does not include the new roofline sections (SpeedOfLight_HiearchicalTensorRooflineChart).
This would require you to use --section to add sections and likely require you to specify --print-details=all.
ncu --list-sets
---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
basic LaunchStats, Occupancy, SpeedOfLight, WorkloadDistribution yes 145
detailed ComputeWorkloadAnalysis, LaunchStats, MemoryWorkloadAnalysis, MemoryWorkloa no 460
dAnalysis_Chart, Occupancy, SourceCounters, SpeedOfLight, SpeedOfLight_Roof
lineChart, WorkloadDistribution
full ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no 614
sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, NumaAffin
ity, Nvlink_Tables, Nvlink_Topology, Occupancy, PmSampling, SchedulerStats,
SourceCounters, SpeedOfLight, SpeedOfLight_RooflineChart, WarpStateStats,
WorkloadDistribution
nvlink Nvlink, Nvlink_Tables, Nvlink_Topology no 52
pmsampling PmSampling, PmSampling_WarpStates no 80
roofline SpeedOfLight, SpeedOfLight_HierarchicalDoubleRooflineChart, SpeedOfLight_Hi no 2841
erarchicalHalfRooflineChart, SpeedOfLight_HierarchicalSingleRooflineChart,
SpeedOfLight_HierarchicalTensorRooflineChart, SpeedOfLight_RooflineChart, W
orkloadDistribution
ncu --list-sections
-------------------------------------------- ----------------------------------------------------------------- ------- --------------------------------------------------
Identifier Display Name Enabled Filename
-------------------------------------------- ----------------------------------------------------------------- ------- --------------------------------------------------
C2CLink C2CLink no ...sight Compute/2024.3.0/Sections\C2CLink.section
ComputeWorkloadAnalysis Compute Workload Analysis no ...24.3.0/Sections\ComputeWorkloadAnalysis.section
InstructionStats Instruction Statistics no ...2024.3.0/Sections\InstructionStatistics.section
LaunchStats Launch Statistics yes ...pute/2024.3.0/Sections\LaunchStatistics.section
MemoryWorkloadAnalysis Memory Workload Analysis no ...024.3.0/Sections\MemoryWorkloadAnalysis.section
MemoryWorkloadAnalysis_Chart Memory Workload Analysis Chart no ...0/Sections\MemoryWorkloadAnalysis_Chart.section
MemoryWorkloadAnalysis_Tables Memory Workload Analysis Tables no .../Sections\MemoryWorkloadAnalysis_Tables.section
NumaAffinity NUMA Affinity no ... Compute/2024.3.0/Sections\NumaAffinity.section
Nvlink NVLink no ...Nsight Compute/2024.3.0/Sections\Nvlink.section
Nvlink_Tables NVLink Tables no ...Compute/2024.3.0/Sections\Nvlink_Tables.section
Nvlink_Topology NVLink Topology no ...mpute/2024.3.0/Sections\Nvlink_Topology.section
Occupancy Occupancy yes ...ght Compute/2024.3.0/Sections\Occupancy.section
PmSampling PM Sampling no ...ht Compute/2024.3.0/Sections\PmSampling.section
PmSampling_WarpStates PM Sampling: Warp States no ...2024.3.0/Sections\PmSampling_WarpStates.section
SchedulerStats Scheduler Statistics no ...e/2024.3.0/Sections\SchedulerStatistics.section
SourceCounters Source Counters no ...ompute/2024.3.0/Sections\SourceCounters.section
SpeedOfLight GPU Speed Of Light Throughput yes ... Compute/2024.3.0/Sections\SpeedOfLight.section
SpeedOfLight_HierarchicalDoubleRooflineChart GPU Speed Of Light Hierarchical Roofline Chart (Double Precision) no ...OfLight_HierarchicalDoubleRooflineChart.section
SpeedOfLight_HierarchicalHalfRooflineChart GPU Speed Of Light Hierarchical Roofline Chart (Half Precision) no ...edOfLight_HierarchicalHalfRooflineChart.section
SpeedOfLight_HierarchicalSingleRooflineChart GPU Speed Of Light Hierarchical Roofline Chart (Single Precision) no ...OfLight_HierarchicalSingleRooflineChart.section
SpeedOfLight_HierarchicalTensorRooflineChart GPU Speed Of Light Hierarchical Roofline Chart (Tensor Core) no ...OfLight_HierarchicalTensorRooflineChart.section
SpeedOfLight_RooflineChart GPU Speed Of Light Roofline Chart no ...3.0/Sections\SpeedOfLight_RooflineChart.section
WarpStateStats Warp State Statistics no ...e/2024.3.0/Sections\WarpStateStatistics.section
WorkloadDistribution GPU and Memory Workload Distribution yes .../2024.3.0/Sections\WorkloadDistribution.section