How to get speed of light with ncu-cli

Hi,

My aim is to collect the details about memory-bandwidth utilization and compute-core utilization of TGI server.

since SpeedOfLight section collects the same, i went with

ncu --replay-mode application -o  profile --launch-skip 900 --section SpeedOfLight --metrics gpu__time_duration.sum --target-processes-filter regex:text-generation-server  text-generation-launcher

the above was very much slow and i ran it very long and managed to collect 3GB of ncu-rep file. But when loading in UI due to RAM constraint(exceeding 32GB to load a 3GB report file), im not to view it, even after adding adding swap constantly getting not-responding state.

so i tried to check other alternate,

ncu --import profile.ncu-rep --print-fp true --print-details all --section section --csv
but not able to get the report, getting errors(attached image)

Please let me know of any mistakes i made/ how can i get the report with ncu-cli

Im using ubuntu 23.04

Hi, @harikrishna

Sorry for the issue you met. As in the screenshot, it shows the required metric XXX could not be found.

Can you use --metrics to collect those metrics and see what error reported ?
Also in order to reduce the report size, can you use -c option in the command line also ?

I added --metrics and -c still getting same report i.e metrics values are not there and the no. of lines in the output file also not not reduced… it is still same with and without using -c

full command which i used:
ncu --import profile.ncu-rep --print-fp true --print-details all --section section --csv --metrics -c

the error im getting is:

    ERR   Required metric sm__throughput.avg.pct_of_peak_sustained_elapsed could not be found.         
    ----- --------------------------------------------------------------------------------------------------------------
    ERR   Required metric gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed could not be found.         
    ----- --------------------------------------------------------------------------------------------------------------
    ERR   Rule Bottleneck returned an error: Some required metrics are missing; aborted rule execution.         
    ----- --------------------------------------------------------------------------------------------------------------
    ERR   <built-in function raise_exception> returned a result with an exception set         
          /root/Documents/NVIDIA Nsight Compute/2023.3.0/Sections/SpeedOfLight.py:90         
          /opt/nvidia/hpc_sdk/Linux_x86_64/23.11/profilers/Nsight_Compute/target/linux-desktop-glibc_2_11_3-x64/../../se
          ctions/RequestedMetrics.py:231         
          /opt/nvidia/hpc_sdk/Linux_x86_64/23.11/profilers/Nsight_Compute/target/linux-desktop-glibc_2_11_3-x64/../../se
          ctions/NvRules.py:2834         

i see… above two metrics are required ones… i will try collecting without adding --metrics.

ncu --replay-mode application -o profile --launch-skip 900 --section SpeedOfLight --target-processes-filter regex:text-generation-server text-generation-launcher
Let me know if anything can be optimized in collecting, i need to SpeedOfLight only.

Thanks for your input.

Hi, @harikrishna

Are you using Nsight Compute from HPC_SDK ?

Hi @veraj
Yes, using from HPC_SDK only.

i’m running ncu inside docker
/opt/nvidia/hpc_sdk/Linux_x86_64/23.11/compilers/bin/ncu this is my ncu path in docker

Good to know.

Can you execute below 2 command line to see what happens

ncu --query-metrics --query-metrics-mode all | grep sm__throughput.avg.pct_of_peak_sustained_elapsed

ncu --query-metrics --query-metrics-mode all | grep gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed

Also to avoid the error, you can use “ncu --section SpeedOfLight --apply-rules no ${Your_Sample}”

I’m sorry,
but
ncu --query-metrics --query-metrics-mode is not working with --import option

output:
==ERROR== Invalid option --query-metrics specified for --mode import

I mean you just execute the command line I gave. You don’t have to add other options.