Object Detection Inference Optimisation

We currently have a pytroch model that we convert to FP32 ONNX model and then into TRT to run on Jetsons.

I have heard that changing to FP16 can increase inference speed without losing much performance.

  1. Is FP16 on jetsons recommended for improving object detection inference speed on jetsons?
  2. Is the recommended way of implementing it to convert the 32 ONNX model to 16 then change to TRT? Or can we convert a 32 onnx to a 16 TRT directly?
  3. Is it recommended to train the model in the first place with FP16 to speed up the training process?

Hi @alex247, sure - personally, I always run models in FP16 on Jetson (with TensorRT) if INT8 model/calibration isn’t available. INT8 will give even higher performance on Xavier/Orin than FP16, but INT8 requires the calibration table and typically Quantization-Aware Training (QAT) for best results - whereas FP16 you can just run from any typical FP32 model through TensorRT without extra steps and still get good performance/accuracy. The TAO pre-trained models come with INT8 ready to go, though.

For FP16 inference, you should just be able to export your normal FP32 model to ONNX without needing to do anything to it. TensorRT will handle the FP16 conversion internally, including the input/output tensors.

That’s up to you I suppose, but it’s not required and I haven’t personally done it, although I believe you can use AMP (Automatic Mixed Precision) training if you want to speed up the process. I typically just train in normal FP32 mode and run in FP16 during inference time.

Thats great thank you!

Just to confirm, if we are already converting a 32 Onnx into TRT, does that mean that we are by default running in FP16? Or is this something that needs to be enabled somewhere? If it is enabled by default, how can we go back to FP32 to measure the difference?

By default, TensorRT will run your model with FP32 - you have to set a flag when building your TensorRT engine to enable FP16, like here: https://github.com/dusty-nv/jetson-inference/blob/8e5bf1e6a96b9faab9107649a7173a52a2dd3857/c/tensorNet.cpp#L926

Or if you are using trtexec, there’s the --fp16 flag, or in the TensorRT Python API there is a similar flag to above.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.