Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson AGX Thor
• JetPack Version (valid for Jetson only) JP7.0
• TensorRT Version 10.13
I am trying to get an estimate of how MiDas V3 DPTHybrid model would perform on Jetson AGX Thor. When I profiled using trtexec, I see the following:
- MidasV3 FP16: 173 FPS, Mean latency: 5.8ms
- MidasV3 INT8: 97 FPS, Mean latency: 10.3 ms
- MidasV3 Best: 124.62 FPS, Mean latency: 8.09ms (generated with –best flag)
I am doing a PTQ using trtexec. I do not have a calibration set, trtexec does it with random inputs. Why am I seeing poorer performance with INT8 compared to FP16? Why is the engine generated with –best flag not match the performance of FP16?
Thank you for your time in advance.
Hi @ramya.raghavendra, thanks for the detailed post.
This behavior is unexpected. I have escalated this to the internal teams handling Jetson and TensorRT for investigation. I will update you as soon as I have more information.
In the meantime, have you benchmarked this model on a discrete GPU to see if the INT8 regression persists there? This would help us determine if the issue is specific to the AGX Thor platform or related to the model’s quantization generally.
Thank you for your patience.
Thank you for your response. I have not benchmarked the model on a discrete GPU.
If it helps, I would like to share that I see the same INT8 regression with Depth-Anything-V2 model as well. I also see a drop in performance with FP8. Again, these numbers are all generated using trtexec. Should I benchmark the performance differently?
- DAV2_Small_518 FP16: 330.942 FPS Mean Latency: 2.95 ms
- DAV2_Small_518 FP8: 52.06 FPS Mean Latency: 19.06 ms
- DAV2_Small_518 INT8: 57.00 FPS Mean Latency: 17.68 ms
I appreciate you escalating this issue to the internal teams. I would appreciate if I could get some guidance on whether PTQ with trtexec and flags like –fp8, –int8 is the correct way to benchmark models like Midas and Depth-Anything-V2. Thank you!
I was able to benchmark Midas and DAV2 on my laptop that has a Nvidia RTX A2000. I see the same INT8 regression when I use trtexec to benchmark the models.
Midas: 64 FPS (FP16), 47 FPS (INT8). DAV2_small_518: 127 FPS (FP16), 41.3 FPS (INT8)
Another question - why is the performance with –best poorer than FP16? If –best is doing mixed precision, why is the engine not choosing FP16 for all the operations?
Thank you again for your help and guidance!