I want to profile the inferences done by the tensorRT network I built.
I want to get a similar graph that the one obtained in part “1.5. CUDA Profiling” of the following documentation:
It says :
“When profiling a TensorRT application, it is recommended to enable profiling only after the engine has been built.
During the build phase, all possible tactics are tried and timed. Profiling this portion of the execution will not
show any meaningful performance measurements and will include all possible kernels, not the ones actually selected
for inference. One way to limit the scope of profiling is to:
Structure the application to build and then serialize the engines in one phase.
Load the serialized engines and run inference in a second phase.
Profile this second phase only.”
So, I need to enable profiling only during the infer() function. (I did not have serialized my engine).
I succeed to use Nsight System CLI in profile mode but it captures a trace for the entire application.
How can I enable profiling at a precise moment from tensorRT Application ?
Nsight Systems 2020.3.4