I’m having issues profiling things with nv-nsight-cu-cli
. When run by itself, the following tensorflow program took less than a minute (much less; ten seconds, perhaps?):
/usr/local/NVIDIA-Nsight-Compute/nv-nsight-cu-cli \
-o mnist_softmax_deep_fp16_advanced.ns-cuprof-report \
~/edit/venv/bin/python mnist_softmax_deep_fp16_advanced.py
Running it under nv-nsight-cu-cli
has been running for over an hour and it’s unclear how far progressed it is. There is a lot of output of the form: ==PROF== Profiling "EigenMetaKernel" - 1120: 0%....50%....100% - 47 passes
. This is problematic because the real program I need to profile normally takes 10 minutes to run.
What can I do to have it profile at something approaching real-time?
The code is from fp16-demo-tf/mnist_softmax_deep_fp16_advanced.py at master · khcs/fp16-demo-tf · GitHub .