Hi! I want to use command line to profile all the roofline, and I can use it in my windows laptop, I see a command line like this:
“C:/Program Files/NVIDIA Corporation/Nsight Compute 2023.3.1/target/windows-desktop-win7-x64/ncu.exe” --config-file off --export C:/文件备份/数据流项目/代码文件夹/test-1 --force-overwrite --target-processes application-only --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section NumaAffinity --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section PmSampling --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats C:/文件备份/数据流项目/代码文件夹/delete3.exe
So I have a command line in Ubuntu, and I want to run it, like this:
/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug
But it shows that:
What should I do? Could anyone kindly give me a command line? Thanks!!!
veraj
January 18, 2024, 10:21am
2
Hi, @202476410arsmart
Can you please get the latest Nsight Compute in Linux to have a try. You are using NCU from 11.7.
After upgrade, you can use command line like
ncu --config-file off --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode strict --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug
Oh, thank you!!! Well, let me try your command first, because I am in a server and it is not easy to change the ncu version…
Also, what I really care about is, I want different roofline! So I want to add
–section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart
What do you think? How to add them?
If you want all the roofline sections you can also use ncu option –set roofline
1 Like
/usr/local/cuda-11.7/bin/ncu --force-overwrite --export /home/zyhuang/tri-GPU/accelerating-TC-main/output-file-full.nsight-cuprof-report --replay-mode application --app-replay-match grid --app-replay-buffer file --set roofline ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug
Cry…it still shows error…
By the way, I delete the ‘–config-file’ and ‘–app-replay-mode’ because they will cause error.
By the way, I can use this:
/usr/local/cuda-11.7/bin/ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug
But the roofline here only has float and it does not suit my need(uint32_t)
veraj
January 19, 2024, 12:46am
6
Do you have this dropdown in your report ?
1 Like
Really grateful for your help!! I tried this and it works:
/usr/local/cuda-11.7/bin/ncu --set roofline --replay-mode application --app-replay-match grid --app-replay-buffer file -f --export output-file-full.nsight-cuprof-report ./tric_gpu -f /home/zyhuang/tri-GPU/accelerating-TC-main/merged_final.bin debug
But again, several rooflines does not even show one point! Strange! I am using uint32_t to compute and I guess it shoud be double… but nothing shows here…
Would you help me to check it? Thank you!!
output-file-full.nsight-cuprof-report.zip (212.4 KB)
veraj
Closed
July 22, 2024, 12:00am
8
This topic was automatically closed after 13 days. New replies are no longer allowed.