NCU too slow and incomplete

I need to measure the DRAM util, gpu util per kernel and other stats - im using command sudo -E CUDA_VISIBLE_DEVICES=0 ncu --set basic --launch-count 100 --force-overwrite -o ncu_8b_Q2_k --section-folder="/usr/local/cuda-12.8/nsight-compute-2025.1.1/sections/" ./llama-cli -m <model_path> -ngl 99 --prompt <my_prompt> -no-cnv -c 512 -n 50 ; if i dont set the launch count it takes forever to run, previously i set --metrics sm__throughput.avg.pct_of_peak_sustained_elapsed,dram__throughput.avg.pct_of_peak_sustained_elapsed but for both cases, the NVIDIA compute doesn’t show any useful info. Where am i supposed to get the metric values?

Hi, @hys4qm

Did you get report generated at last after set the launch count ?
If not, did you get any error ? What’s the output of the command ?

If i forcefully set launch count then the report generated is as shown, otherwise it keeps running for hours

You can go to “Raw” page and filter by the metric name. If the metric is been collected, it should be found there.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.