Tensorrt Model accuracy problem

I have a onnx model, which runs correctly through OnnxRuntime CPU Excecution provider while it has severe accuracy problems while running using Tensorrt. I have used polygraphy to identify the minimal failing subgraph. I am uploading the onnx model zipped in subgraph.zip for your reference. I am attaching the logs for accuracy comparison as well, in bug.txt file.

Environment

TensorRT Version: 8.6.3
GPU Type: NVIDIA GeForce RTX 2060
Nvidia Driver Version: 525.147.05
CUDA Version: V12.3.107
Container : nvcr.io/nvidia/tensorrt:24.02-py3

The command i used is:
polygraphy run subgraph.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --fail-fast --atol 1e-8 --tf32
bug.txt (6.5 KB)
subgraph.zip (3.8 KB)