Model running on DLA with TensoRT(8.4.0) is slower than TensorRT(8.3.0)


I install JetPack 5.0.1 to Orin, the version of TensorRT is 8.4.0. I test model on DLA with gpu fallback, found that the latency(fp16) is significantly slower than TensorRT 8.3.0. The difference is very large, about two to three times。

A test model onnx file.
model.onnx (10.0 MB)

FP16 results comparison(trt 8.4.0 vs 8.3.0):
31ms vs 13ms
But int8 is 3ms under trt 8.4.0, seems reasonable.

I want to know why the version upgrade brings such a big difference in latency? Or the upgrade brings bugs?
What is the theoretical ratio of TOPS between int8 and fp16 on DLA? Is the results difference (fp16 vs int8) reasonable?

System details:
Machine: Jetson AGX Orin 64GB
SDK: JetPack 5.0.1 Developer Preview
Power mode: MAXN

cmd: trtexec --onnx=model.onnx --useDLACore=0 --allowGPUFallback --fp16

Dear @Constantineman,
could you share the complete trtexec logs of both TRT 8.3 and TRT 8.4


How did you install TensorRT 8.3.0 to AGX Orin? I guess there is publicly available released version of 8.3.0 for AGX Orin? Can you specify the JetPack version or any installation method?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.

Dear @Constantineman,
Could you provide any update on the ask?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.