Minimize tracing overhead / Resource Management

I am using the Visual Profiler to trace things of fairly small granularity - kernels that take tens of microseconds. I see fairly huge boxes labeled “Resource Management”, usually three consecutive boxes that take about a millisecond each. They kind of get in the way. At the same time, a tracefile with only a handful of kernel calls is megabytes in size. Obviously, the profiler collects much more information than just the trace. Is there a way to force the profiler to only collect “the trace”, i.e., kernel start and end times?

If you don’t find the right options in the Visual Profiler and you happen to be looking at Nsight Visual Studio on Windows, I know it has more flexible profiling/tracing options. I think the Visual Profiler can start some tasks manually, but they require some sort of initial timeline.

You can use nvprof to capture just start and stop times (durations). That data can then be imported into nvvp to display a timeline. Review the profiler documentation:

http://docs.nvidia.com/cuda/profiler-users-guide/index.html#axzz30OYGDA9g

or use nvprof help:

nvprof --help