How are CUDA resources allocated under dual processes?

Description

Hi! I’m recently trying to use TensorRT to speed up my segmentation network Topformer.
The specific acceleration performance is as follows:
infercost
This time-consuming seems acceptable.
But when I set the batch to 8, and then run 2 processes at the same time, I found that the time-consuming of forward reasoning fluctuates greatly.

At present, there are two main points of confusion for me:

  1. Why does the single-frame time consumption of forward reasoning fluctuate greatly? For example, the minimum time taken here is 17.83ms, and the maximum time is 36.96ms. How to solve this problem? Because in my current task, the time-consuming of each frame of image processing needs to be very stable.
  2. The acceleration benefit of FP16 is not much compared to FP32. What is the reason for this?

Looking forward to your reply, thank you very much.

Environment

TensorRT Version: 8.2.1
GPU Type: 1080
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10.1+cu113
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
We recommend you to check the below samples links in case of tf-trt integration issues.

If issue persist, We recommend you to reach out to Tensorflow forum.
Thanks!

Hi! It’s not a problem with the tensorflow framework, it’s a problem with tensorrt.

Hi,

We recommend you to please try on the latest TensorRT version 8.4 GA and if you still face this issue, please share with issue repro ONNX model and script to try from our end.

Thank you.