INT8 throughput and latency worse than FP16 for MiDas DPT Hybrid model on Thor

ramya.raghavendra · December 17, 2025, 1:07am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson AGX Thor
• JetPack Version (valid for Jetson only) JP7.0
• TensorRT Version 10.13

I am trying to get an estimate of how MiDas V3 DPTHybrid model would perform on Jetson AGX Thor. When I profiled using trtexec, I see the following:

MidasV3 FP16: 173 FPS, Mean latency: 5.8ms
MidasV3 INT8: 97 FPS, Mean latency: 10.3 ms
MidasV3 Best: 124.62 FPS, Mean latency: 8.09ms (generated with –best flag)

I am doing a PTQ using trtexec. I do not have a calibration set, trtexec does it with random inputs. Why am I seeing poorer performance with INT8 compared to FP16? Why is the engine generated with –best flag not match the performance of FP16?

Thank you for your time in advance.

athkumar · December 30, 2025, 12:51pm

Hi @ramya.raghavendra, thanks for the detailed post.

This behavior is unexpected. I have escalated this to the internal teams handling Jetson and TensorRT for investigation. I will update you as soon as I have more information.

In the meantime, have you benchmarked this model on a discrete GPU to see if the INT8 regression persists there? This would help us determine if the issue is specific to the AGX Thor platform or related to the model’s quantization generally.

Thank you for your patience.

ramya.raghavendra · January 4, 2026, 9:28pm

Thank you for your response. I have not benchmarked the model on a discrete GPU.

If it helps, I would like to share that I see the same INT8 regression with Depth-Anything-V2 model as well. I also see a drop in performance with FP8. Again, these numbers are all generated using trtexec. Should I benchmark the performance differently?

DAV2_Small_518 FP16: 330.942 FPS Mean Latency: 2.95 ms
DAV2_Small_518 FP8: 52.06 FPS Mean Latency: 19.06 ms
DAV2_Small_518 INT8: 57.00 FPS Mean Latency: 17.68 ms

I appreciate you escalating this issue to the internal teams. I would appreciate if I could get some guidance on whether PTQ with trtexec and flags like –fp8, –int8 is the correct way to benchmark models like Midas and Depth-Anything-V2. Thank you!

raghavendra.ramya · January 6, 2026, 2:58am

I was able to benchmark Midas and DAV2 on my laptop that has a Nvidia RTX A2000. I see the same INT8 regression when I use trtexec to benchmark the models.

Midas: 64 FPS (FP16), 47 FPS (INT8). DAV2_small_518: 127 FPS (FP16), 41.3 FPS (INT8)

Another question - why is the performance with –best poorer than FP16? If –best is doing mixed precision, why is the engine not choosing FP16 for all the operations?

Thank you again for your help and guidance!

Topic		Replies	Views
Jetson Thor - INT8 quantization show no performance gain over FP16 Jetson Thor tensorrt , jetson-inference , tensorrt-model-optimizer	8	468	February 9, 2026
Jetson Thor - INT8 quantization show no performance gain over FP16 (2) Jetson Thor jetson , tensorrt-model-optimizer , jetson-orin	5	322	March 27, 2026
Jetson Thor AGX - Poor INT8 performance Jetson Thor tensorrt , jetson-inference	7	432	April 1, 2026
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2524	January 6, 2022
TRT Engin in INT8 is much slower than FP16 TensorRT	4	2131	November 11, 2021
Post quantization aware training is slower than fp16 and post quantization TensorRT	12	2915	September 25, 2024
Int8 is not faster than fp16 on xavier Jetson AGX Xavier tensorrt	5	876	October 18, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1659	September 8, 2022
[Hugging Face transformer models + pytorch_quantization] PTQ quantization int8 is slower than fp16 TensorRT tensorrt , python , onnx , natural-language-processing-nlp	4	3181	January 6, 2022
Same inference speed for INT8 and FP16 TensorRT	10	6397	October 12, 2021

INT8 throughput and latency worse than FP16 for MiDas DPT Hybrid model on Thor

Related topics