How to profile roofline?

Hi! I want to use command line to profile all the roofline, and I can use it in my windows laptop, I see a command line like this:

“C:/Program Files/NVIDIA Corporation/Nsight Compute 2023.3.1/target/windows-desktop-win7-x64/ncu.exe” --config-file off --export C:/文件备份/数据流项目/代码文件夹/test-1 --force-overwrite --target-processes application-only --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section NumaAffinity --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section PmSampling --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats C:/文件备份/数据流项目/代码文件夹/delete3.exe

So I have a command line in Ubuntu, and I want to run it, like this:

/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But it shows that:

What should I do? Could anyone kindly give me a command line? Thanks!!!

Hi, @202476410arsmart

Can you please get the latest Nsight Compute in Linux to have a try. You are using NCU from 11.7.

After upgrade, you can use command line like

ncu --config-file off --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode strict --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

Oh, thank you!!! Well, let me try your command first, because I am in a server and it is not easy to change the ncu version…

Also, what I really care about is, I want different roofline! So I want to add

–section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart

What do you think? How to add them?

If you want all the roofline sections you can also use ncu option –set roofline

1 Like

/usr/local/cuda-11.7/bin/ncu --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

Cry…it still shows error…
By the way, I delete the ‘–config-file’ and ‘–app-replay-mode’ because they will cause error.

By the way, I can use this:

/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But the roofline here only has float and it does not suit my need(uint32_t)

Do you have this dropdown in your report ?

1 Like

Really grateful for your help!! I tried this and it works:

/usr/local/cuda-11.7/bin/ncu --set roofline --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But again, several rooflines does not even show one point! Strange! I am using uint32_t to compute and I guess it shoud be double… but nothing shows here…


Would you help me to check it? Thank you!!
output-file-full.nsight-cuprof-report.zip (212.4 KB)

This topic was automatically closed after 13 days. New replies are no longer allowed.