Nsight compute command line roofline option

Hi,

I am trying to get a roofline analysis for a kernel.
I am using the nsight compute command line on a remote host and then opening the report on my local system’s ncu-ui.

When I open the report, there is no roofline plot.

The online documentation for the ncu-ui GUI says to activate the roofline plot by checking the box in the profile options.

However, I cannot find anywhere what the command line equivalent for this is.

My current command line is:

ncu --kernel-regex one_minus_div_grad_v_27137_gpu --launch-skip 268 --launch-count 1 --target-processes all --export masz_med_roofline mpiexec -np 1 ./mas mas

What do I have to add to enable the roofline plot?

Thanks!

  • Ron

Hi Ron,

I believe you need to use ether the “detailed” or “full” set (–set full) since roofline isn’t in the default set:

% ncu --list-sets
---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections                                                                    Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
default    LaunchStats, Occupancy, SpeedOfLight                                        yes     35
detailed   ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no      157
           sis, Occupancy, SchedulerStats, SourceCounters, SpeedOfLight, SpeedOfLight_
           RooflineChart, WarpStateStats
full       ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no      162
           sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, Occupancy
           , SchedulerStats, SourceCounters, SpeedOfLight, SpeedOfLight_RooflineChart,
            WarpStateStats
source     SourceCounters                                                              no      47

-Mat

2 Likes

But I can use --set roofline command…

ncu --set roofline–replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode relaxed -f --export output-file-full.nsight-cuprof-report ./a.out

Well, maybe this is a new feature:

(base) a100-04% ncu --list-sets
---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections                                                                    Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
basic      LaunchStats, Occupancy, SpeedOfLight                                        yes     47               
detailed   ComputeWorkloadAnalysis, LaunchStats, MemoryWorkloadAnalysis, MemoryWorkloa no      199              
           dAnalysis_Chart, Occupancy, SourceCounters, SpeedOfLight, SpeedOfLight_Roof                          
           lineChart                                                                                            
full       ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no      306              
           sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, NumaAffin                          
           ity, Nvlink_Tables, Nvlink_Topology, Occupancy, PmSampling, SchedulerStats,                          
            SourceCounters, SpeedOfLight, SpeedOfLight_RooflineChart, WarpStateStats                            
nvlink     Nvlink, Nvlink_Tables, Nvlink_Topology                                      no      22               
roofline   SpeedOfLight, SpeedOfLight_HierarchicalDoubleRooflineChart, SpeedOfLight_Hi no      61               
           erarchicalHalfRooflineChart, SpeedOfLight_HierarchicalSingleRooflineChart,                           
           SpeedOfLight_HierarchicalTensorRooflineChart, SpeedOfLight_RooflineChart                             
(base) a100-04% ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.3.1.0 (build 33474944) (public-release)