Hi,
Best Practices For TensorRT Performance :: NVIDIA Deep Learning TensorRT Documentation
Decoding the kernel names back to layers in the original network can be complicated. Because of this, TensorRT uses NVTX to mark a range for each layer, which then allows the CUDA profilers to correlate each layer with the kernels called to implement it. In TensorRT, NVTX helps to correlate the runtime engine layer execution with CUDA kernel calls. Nsight Systems supports collecting and visualizing these events and ranges on the timeline. Nsight Compute also supports collecting and displaying the state of all active NVTX domains and ranges in a given thread when the application is suspended
When profiling a TensorRT application, it is recommended to enable profiling only after the engine has been built. During the build phase, all possible tactics are tried and timed. Profiling this portion of the execution will not show any meaningful performance measurements and will include all possible kernels, not the ones actually selected for inference. One way to limit the scope of profiling is to:
- First phase
Structure the application to build and then serialize the engines in one phase.- Second phase
Load the serialized engines and run inference in a second phase.- Third phase
Profile this second phase only.
As recommended, I would like to do NVTX filtering during profiling collection.
I look at Nsight Systems CLI options.
Nsight Systems User Guide :: Nsight Systems Documentation (nvidia.com)
I noticed this option :
--nvtx-capture
Parameters
range@domain,range,range@
Description
Specify NVTX capture range. See below for details. This option is applicable only when used along with --capture-range=nvtx.`
In order to do that, I would like to know which domain name and range name I should use in my the Nsight CLI command line (The ones TensorRT uses when creating NVTX events).
Thx,