GPU is idle between two operators of enqueueV3

Description

Hi,
I am trying to profile and optimize detection model using tensorrt10 and nsight system. I found that between two operators, the GPU is idle for about 10ms without any reason. I have following questions and hope anyone can answer them:

  1. I use data-dependent shape feature in my model, and I guess the trainStation operator communicate shapes between plugins (especially PluginV2 and PluginV3), is it right?
  2. If the previous guess is correct, does that mean we can solve the above problem by migrating PluginV2 to PluginV3?
  3. If the previous guess is incorrect, what does the trainStation mean and how to remove it?

Thanks very much!

Environment

TensorRT Version: 10.4.0
GPU Type: RTX 3090
Nvidia Driver Version: 550.54.14
CUDA Version: 11.8
CUDNN Version: 8.9.6.50
Operating System + Version: ubuntu 20.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.3.1+cu118
Baremetal or Container (if container which image + tag):
Nsight System Verion: 2023.4.4.54-234433681190v0 Linux

Relevant Files


nsys-report.tar.gz (13.0 MB)

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @shuo-ouyang ,
I am getting more info on this, shall share the updates with you.

Thanks