For the pre-processing of YOLOv5, how do I speed up the pre-processing of YOLOv5 through TensorRT?

Description

Currently, I am using TensorRT to accelerate the forward reasoning of Yolov5. Currently, the acceleration effect is remarkable for the forward reasoning, which takes 13ms before acceleration and 5ms after acceleration based on Float16.
But now I have a problem, which is that the amount of time it takes to process forward is far greater than the amount of time it takes to reason forward.
Its concrete implementation process is as follows:

What I want to know is whether the processing here can be accelerated by tensorRT?
thanks!

Environment

TensorRT Version: 7.0.0.11
GPU Type: 2080ti
Nvidia Driver Version: 418
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: centos7
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): /
PyTorch Version (if applicable): /
Baremetal or Container (if container which image + tag): /

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @wentao.robot
Looks like it could be done with a Scale or Elementwise layer + Shuffle layer .
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Graph/Layers.html#ishufflelayer
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Graph/Layers.html#ielementwiselayer
Thanks!

@wentao.robot hi wentao,
you could reference this https://github.com/enazoe/yolo-tensorrt/blob/master/modules/yolo.cpp#L671