Description
I have two networks that work back to back. The output of the first is passed to the second.
When I create two separate onnx files for each of these networks, then use them in TensorRT for inference (all in the same code, both networks are loaded, converted to .trt and used back to back). Everything works fine with an inference time of around 7ms.
For simplicity reasons, I decided to merge the two onnxes into one. So I merged them into a single onnx (using the respective pytorch code) and I transfered the weights into a single onnx. I used the same TensorRT inference code. But the execution time raised from 7ms to 11ms.
I checked my code and everythin is how it should be. What could be going wrong?
Environment
TensorRT Version: 8.6.1.6
GPU Type: RTX 4080
Nvidia Driver Version: nvidia-driver-535
CUDA Version: 12.1
Operating System + Version: Ubuntu 22.04
PyTorch Version (if applicable): 2.2.1