Hi, I’m building an SDK in which I use multiple engines. When each model is tested alone, the inference time taken by each model is close to the mean time I see using
trtexec --loadEngine=<model.engine> --iterations=100. But, when run in the SDK, all the models give a worse performance(sometimes even by 40%!!).
In the SDK, I’m doing the ‘init’ for all the models together(basically loading the engine and creating the context). After that I call the inference for the engine I require. I have 4 models loaded.
Am I doing something wrong or is this the expected behaviour? Is there a better way to do it?
TensorRT Version: 7.1.3
GPU Type: Jetson NX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.6.9