[s]The application I need to profile begins with a large data load (several minutes).
Kernels are only run after the data load. Depending on the test case, there could be anywhere from a few dozen to a few thousand kernels run, but most test cases only last 15 or 20 seconds.
Having the profiler active for the entire data-load phase takes a huge amount of time and memory. It also produces a lot of unnecessary data which the profiler then has to waste time wading through.
What’s the best way to have the profiler active only for the second phase when kernels are run?
(Is this even possible?)
Nevermind, found the nvprof option.