nvprof could collect how many times spefific events happen.But I’m not sure how does nvprof do that.
More specificlly,does nvprof keep collecting information during the whole time when application run?
Or it just does sampling when application is running,so the results of events reported by nvprof is not equal to how many times thoese events actually happens.
nvprof collects events for a kernel in isolation i.e. by serializing the kernels in the application, so that events can be attributed to a specific kernel. This helps user understand and analyse the optimization opportunities for each kernel separately. If the specified events/metrics can’t be profiled in a single run of the application, nvprof by default replays each kernel multiple times until all the events/metrics are collected. The --replay-mode option can be used to change the replay mode. In “application replay” mode, nvprof re-runs the whole application instead of replaying each kernel, in order to collect all events/metrics.
When collecting events/metrics, nvprof profiles all kernels launched on all visible CUDA devices by default. The profiling scope can be limited to a specific context, stream, kernel or kernel invocation. More details about profiling scope can be found at Profiler :: CUDA Toolkit Documentation";