caplanr
November 27, 2020, 1:03am
1
Hi,
I am trying to get a roofline analysis for a kernel.
I am using the nsight compute command line on a remote host and then opening the report on my local system’s ncu-ui.
When I open the report, there is no roofline plot.
The online documentation for the ncu-ui GUI says to activate the roofline plot by checking the box in the profile options.
However, I cannot find anywhere what the command line equivalent for this is.
My current command line is:
ncu --kernel-regex one_minus_div_grad_v_27137_gpu --launch-skip 268 --launch-count 1 --target-processes all --export masz_med_roofline mpiexec -np 1 ./mas mas
What do I have to add to enable the roofline plot?
Thanks!
Hi Ron,
I believe you need to use ether the “detailed” or “full” set (–set full) since roofline isn’t in the default set:
% ncu --list-sets
---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
default LaunchStats, Occupancy, SpeedOfLight yes 35
detailed ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no 157
sis, Occupancy, SchedulerStats, SourceCounters, SpeedOfLight, SpeedOfLight_
RooflineChart, WarpStateStats
full ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no 162
sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, Occupancy
, SchedulerStats, SourceCounters, SpeedOfLight, SpeedOfLight_RooflineChart,
WarpStateStats
source SourceCounters no 47
-Mat
2 Likes
But I can use --set roofline command…
ncu --set roofline–replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode relaxed -f --export output-file-full.nsight-cuprof-report ./a.out
Well, maybe this is a new feature:
(base) a100-04% ncu --list-sets
---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
basic LaunchStats, Occupancy, SpeedOfLight yes 47
detailed ComputeWorkloadAnalysis, LaunchStats, MemoryWorkloadAnalysis, MemoryWorkloa no 199
dAnalysis_Chart, Occupancy, SourceCounters, SpeedOfLight, SpeedOfLight_Roof
lineChart
full ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no 306
sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, NumaAffin
ity, Nvlink_Tables, Nvlink_Topology, Occupancy, PmSampling, SchedulerStats,
SourceCounters, SpeedOfLight, SpeedOfLight_RooflineChart, WarpStateStats
nvlink Nvlink, Nvlink_Tables, Nvlink_Topology no 22
roofline SpeedOfLight, SpeedOfLight_HierarchicalDoubleRooflineChart, SpeedOfLight_Hi no 61
erarchicalHalfRooflineChart, SpeedOfLight_HierarchicalSingleRooflineChart,
SpeedOfLight_HierarchicalTensorRooflineChart, SpeedOfLight_RooflineChart
(base) a100-04% ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.3.1.0 (build 33474944) (public-release)