Merged network is slower than two separate networks

elight1 · April 2, 2024, 9:30am

Description

I have two networks that work back to back. The output of the first is passed to the second.

When I create two separate onnx files for each of these networks, then use them in TensorRT for inference (all in the same code, both networks are loaded, converted to .trt and used back to back). Everything works fine with an inference time of around 7ms.

For simplicity reasons, I decided to merge the two onnxes into one. So I merged them into a single onnx (using the respective pytorch code) and I transfered the weights into a single onnx. I used the same TensorRT inference code. But the execution time raised from 7ms to 11ms.

I checked my code and everythin is how it should be. What could be going wrong?

Environment

TensorRT Version: 8.6.1.6
GPU Type: RTX 4080
Nvidia Driver Version: nvidia-driver-535
CUDA Version: 12.1
Operating System + Version: Ubuntu 22.04
PyTorch Version (if applicable): 2.2.1

AakankshaS · June 30, 2024, 12:17pm

Hi @elight1 ,
Slight difference (+/-) can be expected, however, can you please share the reproducible model and script for us to debug?