I’m having issues profiling things with
nv-nsight-cu-cli. When run by itself, the following tensorflow program took less than a minute (much less; ten seconds, perhaps?):
/usr/local/NVIDIA-Nsight-Compute/nv-nsight-cu-cli \ -o mnist_softmax_deep_fp16_advanced.ns-cuprof-report \ ~/edit/venv/bin/python mnist_softmax_deep_fp16_advanced.py
Running it under
nv-nsight-cu-cli has been running for over an hour and it’s unclear how far progressed it is. There is a lot of output of the form:
==PROF== Profiling "EigenMetaKernel" - 1120: 0%....50%....100% - 47 passes. This is problematic because the real program I need to profile normally takes 10 minutes to run.
What can I do to have it profile at something approaching real-time?