nv-nsight-cu-cli profiles every kernel 47x, is very slow

Eli_Stevens · June 4, 2019, 11:32pm

I’m having issues profiling things with nv-nsight-cu-cli. When run by itself, the following tensorflow program took less than a minute (much less; ten seconds, perhaps?):

/usr/local/NVIDIA-Nsight-Compute/nv-nsight-cu-cli \
    -o mnist_softmax_deep_fp16_advanced.ns-cuprof-report \
    ~/edit/venv/bin/python mnist_softmax_deep_fp16_advanced.py

Running it under nv-nsight-cu-cli has been running for over an hour and it’s unclear how far progressed it is. There is a lot of output of the form: ==PROF== Profiling "EigenMetaKernel" - 1120: 0%....50%....100% - 47 passes. This is problematic because the real program I need to profile normally takes 10 minutes to run.

What can I do to have it profile at something approaching real-time?

The code is from fp16-demo-tf/mnist_softmax_deep_fp16_advanced.py at master · khcs/fp16-demo-tf · GitHub .

hwilper · June 5, 2019, 3:30pm

What you are using there is the Nsight Compute CLI, which is intended for deep dives into individual kernels. If you are looking for high level system profiling, you should be using Nsight Systems (and the nsys CLI).

You can download the latest from NVIDIA Nsight Systems | NVIDIA Developer and a new version was posted just yesterday (I have not even gotten around to posting an announcement in the forum yet).

Let me know if you need help getting started with the Nsight Systems CLI

Topic		Replies	Views
Nsight extremely slow Profiling Linux Targets tensorflow , ubuntu , nsight	0	474	December 2, 2020
Nsight compute option to profile only 1 process Nsight Compute cuda	1	380	August 28, 2023
Takes days to profile my code Nsight Compute	6	1363	April 27, 2021
NVPROF & NV_NSIGHT are much slower than adding CUPTI to the code CUPTI – CUDA Profiler Tools Interface cuda	5	833	October 7, 2020
prepare functions to profile with nvprof CUDA Programming and Performance	2	545	May 28, 2019
No kernel info in nsight nvprof Profiling Linux Targets nsight	0	727	March 5, 2022
How do I use nv-nsight-cu-cli and the GUI version for profiling? Nsight Compute	3	1746	May 1, 2019
kernel runs much faster when being profiled with Visual Profiler Visual Profiler and nvprof	4	4690	August 29, 2014
Why does Throughput improve when profiling my TensorRT model inference performance using ncu Nsight Compute	4	248	July 18, 2024
Why NVPROF and Nsight not profiling one of the kernels? CUDA Programming and Performance	5	2281	October 26, 2015

nv-nsight-cu-cli profiles every kernel 47x, is very slow

Related topics