How to profile only a portion of TensorRT application with Nsight Systems?

I want to profile the inferences done by the tensorRT network I built.
I want to get a similar graph that the one obtained in part “1.5. CUDA Profiling” of the following documentation:
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html

It says :
“When profiling a TensorRT application, it is recommended to enable profiling only after the engine has been built.
During the build phase, all possible tactics are tried and timed. Profiling this portion of the execution will not
show any meaningful performance measurements and will include all possible kernels, not the ones actually selected
for inference. One way to limit the scope of profiling is to:
First phase
Structure the application to build and then serialize the engines in one phase.
Second phase
Load the serialized engines and run inference in a second phase.
Third phase
Profile this second phase only.”

So, I need to enable profiling only during the infer() function. (I did not have serialized my engine).
I succeed to use Nsight System CLI in profile mode but it captures a trace for the entire application.
How can I enable profiling at a precise moment from tensorRT Application ?

Thanks,

Nsight Systems 2020.3.4
TensorRT 7.2

Hi @juliefraysse,

The cuda profiler itself can be controlled programmatically. Specifically, you can wrap the kernel-launch/function-call of interest around a block of cudaProfilerStart() and cudaProfilerStop() .
There’s probably a more minimal example somewhere else, but we use this approach in MLPerf-I, as seen here.

Thank you.