Description
I made some model to test run time.
Models have only different at input one or two
First two group dual_input.onnx has two times GPU compute time comparing to single_input.onnx model.
./trtexec --onnx=/home/nvidia/input.onnx --explicitBatch --verbose --workspace=2048
Input size is (3 ,192, 256) in order (CHW).
Output size is (6400,48,64)
single_input6400.onnx, GPU compute mean: 9.07 ms
dual_input6400.onnx, GPU compute mean: 18.20 ms
Input size is (3 ,192, 256) in order (CHW).
Output size is (640,48,64)
single_input640.onnx, GPU compute mean: 0.88 ms
dual_input640.onnx, GPU compute mean: 1.75 ms
But this group has four times difference.
Input size is (3 ,192, 256) in order (CHW).
Output size is (64,48,64)
single_input64.onnx, GPU compute mean: 0.11 ms
dual_input64.onnx, GPU compute mean: 0.22 ms
According to Horizontal Layer fusion, the model with two input has horizontal fusion in first conv layer.
But the GPU compute time seems that no improvement in the test result.
Does horizontal merged convolution layer effect the run time?
All model’s are in Relevant Files zone.
Thank you
Environment
TensorRT Version : 7.1.3
GPU Type : Xavier
Nvidia Driver Version : Package:nvidia-jetpack, Version: 4.4.1-b50
CUDA Version : 10.2.89
CUDNN Version : 8.0.0
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :
Relevant Files
test.rar (2.4 MB)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered