How to profile roofline?

Hi! I want to use command line to profile all the roofline, and I can use it in my windows laptop, I see a command line like this:

“C:/Program Files/NVIDIA Corporation/Nsight Compute 2023.3.1/target/windows-desktop-win7-x64/ncu.exe” --config-file off --export C:/文件备份/数据流项目/代码文件夹/test-1 --force-overwrite --target-processes application-only --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section NumaAffinity --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section PmSampling --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats C:/文件备份/数据流项目/代码文件夹/delete3.exe

So I have a command line in Ubuntu, and I want to run it, like this:

/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But it shows that:

What should I do? Could anyone kindly give me a command line? Thanks!!!

Hi, @202476410arsmart

Can you please get the latest Nsight Compute in Linux to have a try. You are using NCU from 11.7.

After upgrade, you can use command line like

ncu --config-file off --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode strict --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

Oh, thank you!!! Well, let me try your command first, because I am in a server and it is not easy to change the ncu version…

Also, what I really care about is, I want different roofline! So I want to add

–section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart

What do you think? How to add them?

If you want all the roofline sections you can also use ncu option –set roofline

1 Like

/usr/local/cuda-11.7/bin/ncu --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

Cry…it still shows error…
By the way, I delete the ‘–config-file’ and ‘–app-replay-mode’ because they will cause error.

By the way, I can use this:

/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But the roofline here only has float and it does not suit my need(uint32_t)

Do you have this dropdown in your report ?

1 Like

Really grateful for your help!! I tried this and it works:

/usr/local/cuda-11.7/bin/ncu --set roofline --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug

But again, several rooflines does not even show one point! Strange! I am using uint32_t to compute and I guess it shoud be double… but nothing shows here…

Would you help me to check it? Thank you!! (212.4 KB)

This topic was automatically closed after 13 days. New replies are no longer allowed.