Outputs of tensorrt are too different according to the compute capabilities

Description

I have multiple onnx models and I have converted them into tensorrt engines with different GPUs

I used GPUs with Cuda Compatibility 6.1, 7.5, 8.6 and Used TensorRT Versions 7.2.1 and 8.2.1

So I have 6 Engines per each onnx model.

  1. CC8.6 x TRT 7.2.1
  2. CC7.5 x TRT 7.2.1
  3. CC6.1 x TRT 7.2.1
  4. CC8.6 x TRT 8.2.1
  5. CC7.5 x TRT 8.2.1
  6. CC6.1 x TRT 8.2.1,

But every engines converted on CC8.6 GPUs output two different results.

Mean Average Error of the engines on CC8.6 GPU are close to 1e-2
But MAE of the other engines are close to 1e-5

It happens regardless of what trt versions and models are used.

Environment

TensorRT Version: 7.2.1, 8.2.1
GPU Type:
Nvidia Driver Version:
CUDA Version: CUDA 11.1
CUDNN Version: Cudnn 8.0 for TRT 7.2.1 , Cudnn 8.2 for TRT 8.2.1
Operating System + Version: ubuntu bionic
Python Version (if applicable): 3.7

Hi,

Could you please try on the latest TensorRT version 8.5.1 and let us know if you still face this issue. Also please share with us the GPU and driver details.

Thank you.