ONNX TensorRT Engines FP16/32

Hello,

I was exporting some YOLO models to TensorRT and had a question about precisions. I am first exporting to ONNX via Ultralytics and then building the TensorRT engine myself on a Jetson Nano 2 GB. This is the code I am using

yolo export model=yolov8n-seg.pt format=onnx opset=12 imgsz=512

/usr/src/tensorrt/bin/trtexec \
    --onnx=yolov8n-seg.onnx \
    --saveEngine=yolov8n-seg.engine \
    --workspace=512 \
    --fp16

My understanding is that this exports .pt (which doesn’t have a specific precision) to FP32 ONNX, and then converts to FP16 when building the TRT engine. Is there a practical difference between this and exporting to FP16 ONNX and then converting to FP16 when building?

Thank you!

Hi,

It’s recommended to save ONNX with full precision and quantize it when converting it to TensorRT.
So the quantization is measured by a deployed algorithm.

Thanks.

Thank you! By quantization you are referring to FP32 ONNX to FP16 TensorRT, right (since INT8 is not supported on Nano 2 GB, if my understanding is correct)?

Is this still an issue to support? Any result can be shared?

Hi,

Yes, you can find the detailed support matrix in the link below:

CUDA compute capability of Jetson Nano is 5.3, so only FP32 and FP16 are available.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.