Description
Hi,
I am trying to profile and optimize detection model using tensorrt10 and nsight system. I found that between two operators, the GPU is idle for about 10ms without any reason. I have following questions and hope anyone can answer them:
- I use data-dependent shape feature in my model, and I guess the
trainStation
operator communicate shapes between plugins (especially PluginV2 and PluginV3), is it right? - If the previous guess is correct, does that mean we can solve the above problem by migrating PluginV2 to PluginV3?
- If the previous guess is incorrect, what does the trainStation mean and how to remove it?
Thanks very much!
Environment
TensorRT Version: 10.4.0
GPU Type: RTX 3090
Nvidia Driver Version: 550.54.14
CUDA Version: 11.8
CUDNN Version: 8.9.6.50
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.3.1+cu118
Baremetal or Container (if container which image + tag):
Nsight System Verion: 2023.4.4.54-234433681190v0 Linux
Relevant Files
nsys-report.tar.gz (13.0 MB)
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered