profiling for a long running applications


I have a long running kernel to profile, which takes more than 5 minutes using nvprof so as to obtain all the metrics. Is there a better way to speed up the profiling, besides disabling the replay?


Hi, smartvoice

I have deliver the question to our dev.
Please wait the response.

Best Regards


Hi, smartvoice

Following is the advice

Profiler requires multiple passes to collect all the metrics due to limited number of hardware events supported in a single pass. This can result in a significant increase in the application execution time. There are several ways to reduce the execution time of kernel/application under profiling:

  1. Replay mode “kernel replay” is useful for kernels which allocate small amount of device memory as kernel state needs to be saved and restored for each kernel replay pass. Use nvprof option “–replay-mode kernel”
  2. In “application replay” mode, nvprof re-runs the whole application instead of replaying each kernel, in order to collect all events/metrics. In some cases this mode can be faster than “kernel replay” mode if the application allocates large amount of device memory. Use nvprof option “–replay-mode application”
  3. User can limit the number of metrics to be collected using nvprof option “–metrics”
  4. Instead of profiling all kernels in the application, profiling scope can be limited to the kernels of interest using nvprof option “–kernels”

Thanks for the suggestions!