Description
Dear forum developers,
I have two models, one is the object detection and the other is segmentation one.
Each of models have different pre-processing because their input shapes are different.
I’ve serialized and quantized into INT8 two models following NVIDIA’s sample code, and ran successfully.
And the inferencing time of INT8 qunatized model was much faster than FP32/16 one.
But the problem is happened when I tried to run those models at the same time.
If the inferencing time of model A is 3 ms and model B is 4 ms in single inferencing mode, the inferencing time are increased to 6 ms(model A) and 10 ms (model B) in simultaneous running mode, respectively.
I don’t know what happend in here.
The pre-processing uses Opencv-CUDA and post-processing uses OpenGL which are using GPUs.
Could it be the problem?
I’ve turned off the pre/post processing then the inferencing time was increased slightly, but still much slower than the single inferencing.
Please give me some advice.
Thanks.
Environment
TensorRT Version: TensorRT 8
GPU Type: NVIDIA Jetson Xavier AGX
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version: 8.4
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered