Model running on DLA with TensoRT(8.4.0) is slower than TensorRT(8.3.0)

Hi,

I install JetPack 5.0.1 to Orin, the version of TensorRT is 8.4.0. I test model on DLA with gpu fallback, found that the latency(fp16) is significantly slower than TensorRT 8.3.0. The difference is very large, about two to three times。

A test model onnx file.
model.onnx (10.0 MB)

FP16 results comparison(trt 8.4.0 vs 8.3.0):
31ms vs 13ms
But int8 is 3ms under trt 8.4.0, seems reasonable.

I want to know why the version upgrade brings such a big difference in latency? Or the upgrade brings bugs?
What is the theoretical ratio of TOPS between int8 and fp16 on DLA? Is the results difference (fp16 vs int8) reasonable?

System details:
Machine: Jetson AGX Orin 64GB
SDK: JetPack 5.0.1 Developer Preview
Power mode: MAXN

cmd: trtexec --onnx=model.onnx --useDLACore=0 --allowGPUFallback --fp16

Dear @Constantineman,
could you share the complete trtexec logs of both TRT 8.3 and TRT 8.4

@Constantineman

How did you install TensorRT 8.3.0 to AGX Orin? I guess there is publicly available released version of 8.3.0 for AGX Orin? Can you specify the JetPack version or any installation method?