Is not there a replay-mode option?


I am trying to collect some metrics from my CUDA program but it is impossible, the execution never ends (it spends days without finishing):

/usr/local/cuda-10.1/NsightCompute-2019.3/nv-nsight-cu-cli --csv --metrics sm__warps_active.avg.pct_of_peak_sustained_active MYPROGRAM

I think the problem could be that my application uses a large amount of device memory.

Is there an “application replay” mode like in nvprof to re-run the whole application instead of replaying each kernel? It think it could be the solution.

Thanks in advance.

Best regards.

Unfortunately, we don’t yet have application replay in Nsight Compute, but we are working to provide this or similar functionality in the future.

In the meantime, maybe we can solve the issue for you in another way:

  • Could you let us know the OS and GPU you are profiling on?
  • Do you need to profile all kernels in your application, or would it be sufficient to profile only certain kernels, and maybe only certain instances of these? You can check the docs to find options to limit data collection to e.g. a certain number of launches (-c) or specifically named kernels (-k).
  • Is your application using kernels that require concurrent execution, e.g. because they use P2P memory accesses? This is currently not supported by Nsight Compute, as all kernel executions are serialized
  • Do you see the same long runtime if you try to profile other metrics, e.g. gpc__cycles_elapsed.sum, or device__attribute_display_name?