I install JetPack 5.0.1 to Orin, the version of TensorRT is 8.4.0. I test model on DLA with gpu fallback, found that the latency(fp16) is significantly slower than TensorRT 8.3.0. The difference is very large, about two to three times。
A test model onnx file.
model.onnx (10.0 MB)
FP16 results comparison(trt 8.4.0 vs 8.3.0):
31ms vs 13ms
But int8 is 3ms under trt 8.4.0, seems reasonable.
I want to know why the version upgrade brings such a big difference in latency? Or the upgrade brings bugs?
What is the theoretical ratio of TOPS between int8 and fp16 on DLA? Is the results difference (fp16 vs int8) reasonable?
Machine: Jetson AGX Orin 64GB
SDK: JetPack 5.0.1 Developer Preview
Power mode: MAXN
trtexec --onnx=model.onnx --useDLACore=0 --allowGPUFallback --fp16