Description
Hi, I am trying to run Maxvit model from torchvision with ONNX runtime and TensorRT (replaced incompatible einsum operations). Works well when batch size == 1, however when using batch size==32 outputs are vastly different (max abs error > 3 when compared to CPU inference). Is there anything I can do to lower these errors ?
Test code (export code in attachment):
import numpy as np
import onnxruntime
trt_provider = [
('TensorrtExecutionProvider', {
'trt_profile_min_shapes': f'input:1x3x224x224',
'trt_profile_opt_shapes': f'input:32x3x224x224',
'trt_profile_max_shapes': f'input:32x3x224x224',
'trt_engine_cache_enable': True,
'trt_engine_cache_path': '.'
})
]
cpu_inference = onnxruntime.InferenceSession('maxvit.onnx', providers=['CPUExecutionProvider'])
trt_inference = onnxruntime.InferenceSession('maxvit.onnx', providers=trt_provider)
batch = np.random.randn(32, 3, 224, 224).astype(np.float32)
first_item = batch[0][None]
batch_result_trt = trt_inference.run([], {'input': batch})[0]
batch_result_cpu = cpu_inference.run([], {'input': batch})[0]
first_item_result_trt = trt_inference.run([], {'input': first_item})[0]
first_item_result_cpu = cpu_inference.run([], {'input': first_item})[0]
print(f'Max abs error (Batch size = 1): {np.max(np.abs(first_item_result_trt - first_item_result_cpu))}')
print(f'Max abs error (TRT Batch size = 32): {np.max(np.abs(batch_result_trt[0][None] - first_item_result_cpu))}')
print(f'Max abs error (Batch size = 32): {np.max(np.abs(batch_result_trt - first_item_result_cpu))}')
Max abs error (Batch size = 1): 2.86102294921875e-06
Max abs error (TRT Batch size = 32): 3.64798903465271
Max abs error (Batch size = 32): 3.64798903465271
Environment
Ubuntu 20.04.6 LTS
Tesla T4 (Azure, NC4as_T4_v3)
Driver: 535.216.03
CUDA: 12.6
CUDNN: 9.6.0.74
Python: 3.12
TensorRT: 10.7.0.23
Relevant Files
export_and_test_script.zip (1.8 KB)
Steps To Reproduce
- run 1_export.py from uploaded scripts - exports ‘maxvit.onnx’ file (with torch==2.5.1+cu124)
- run 2_test.py from uploaded scripts - runs exported model and compares outputs (with onnxruntime-gpu==1.20.1)