Description
Hi! I’m recently trying to use TensorRT to speed up my segmentation network Topformer.
The specific acceleration performance is as follows:
This time-consuming seems acceptable.
But when I set the batch to 8, and then run 2 processes at the same time, I found that the time-consuming of forward reasoning fluctuates greatly.
At present, there are two main points of confusion for me:
- Why does the single-frame time consumption of forward reasoning fluctuate greatly? For example, the minimum time taken here is 17.83ms, and the maximum time is 36.96ms. How to solve this problem? Because in my current task, the time-consuming of each frame of image processing needs to be very stable.
- The acceleration benefit of FP16 is not much compared to FP32. What is the reason for this?
Looking forward to your reply, thank you very much.
Environment
TensorRT Version: 8.2.1
GPU Type: 1080
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10.1+cu113
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered