Description
Ways to improve Model Inference Speed
We aim to enhance the inference speed of our object detection model without compromising accuracy.
Current Setup:
Model Task: Object Detection (3 Classes)
Hardware: NVIDIA Jetson (likely AGX Orin)
Input: Single image/frame inference
Model Versions:
PyTorch Version:
Model Size: 88 MB
Inference Latency: ~420 ms per frame
TensorRT Version (Quantized):
Model Size: 90 MB on FP16
Inference Latency: ~200 ms per frame
Objectives:
-
Further reduce latency to process high throughput (preferably below 100 ms/ per frame inference)
-
Maintain or minimally impact model accuracy on INT8 quantized version.
Environment
TensorRT Version: 10.3.0.30
GPU Type: Tegra
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
Baremetal or Container (if container which image + tag): ultralytics/ultralytics:latest-jetson-jetpack6