I am trying to collect some metrics from my CUDA program but it is impossible, the execution never ends (it spends days without finishing):
/usr/local/cuda-10.1/NsightCompute-2019.3/nv-nsight-cu-cli --csv --metrics sm__warps_active.avg.pct_of_peak_sustained_active MYPROGRAM
I think the problem could be that my application uses a large amount of device memory.
Is there an “application replay” mode like in nvprof to re-run the whole application instead of replaying each kernel? It think it could be the solution.
Thanks in advance.