Object Detection Inference Optimisation

alex247 · April 17, 2023, 12:23pm

We currently have a pytroch model that we convert to FP32 ONNX model and then into TRT to run on Jetsons.

I have heard that changing to FP16 can increase inference speed without losing much performance.

Is FP16 on jetsons recommended for improving object detection inference speed on jetsons?
Is the recommended way of implementing it to convert the 32 ONNX model to 16 then change to TRT? Or can we convert a 32 onnx to a 16 TRT directly?
Is it recommended to train the model in the first place with FP16 to speed up the training process?

dusty_nv · April 17, 2023, 2:23pm

Hi @alex247, sure - personally, I always run models in FP16 on Jetson (with TensorRT) if INT8 model/calibration isn’t available. INT8 will give even higher performance on Xavier/Orin than FP16, but INT8 requires the calibration table and typically Quantization-Aware Training (QAT) for best results - whereas FP16 you can just run from any typical FP32 model through TensorRT without extra steps and still get good performance/accuracy. The TAO pre-trained models come with INT8 ready to go, though.

For FP16 inference, you should just be able to export your normal FP32 model to ONNX without needing to do anything to it. TensorRT will handle the FP16 conversion internally, including the input/output tensors.

That’s up to you I suppose, but it’s not required and I haven’t personally done it, although I believe you can use AMP (Automatic Mixed Precision) training if you want to speed up the process. I typically just train in normal FP32 mode and run in FP16 during inference time.

alex247 · April 17, 2023, 2:39pm

Thats great thank you!

Just to confirm, if we are already converting a 32 Onnx into TRT, does that mean that we are by default running in FP16? Or is this something that needs to be enabled somewhere? If it is enabled by default, how can we go back to FP32 to measure the difference?

dusty_nv · April 17, 2023, 2:43pm

By default, TensorRT will run your model with FP32 - you have to set a flag when building your TensorRT engine to enable FP16, like here: https://github.com/dusty-nv/jetson-inference/blob/8e5bf1e6a96b9faab9107649a7173a52a2dd3857/c/tensorNet.cpp#L926

Or if you are using trtexec, there’s the --fp16 flag, or in the TensorRT Python API there is a similar flag to above.

system · May 10, 2023, 1:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NX & TRT & Jetson-inference - Not setting precision to INT8 Jetson Xavier NX tensorrt , jetson-inference	4	861	October 18, 2021
Why inference in jetson nano with fp16 is slower than fp32 Jetson Nano tensorrt , jetson-inference	9	1937	September 5, 2021
Time of inference in FP16 and FP32 is the same Jetson TX2 tensorrt	20	1680	August 10, 2022
Frozen Object Detection Model to TensorRT for Faster Jetson Inference: (pb)-(.uff)-(engine file) TensorRT jetson-inference	5	1437	April 9, 2021
How to apply int8 quantization to Transformer on Xavier Jetson Xavier NX tensorrt	2	481	August 12, 2022
Jetson AGX Xavier INT8 Performance Jetson AGX Xavier	4	1765	October 18, 2021
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2843	October 18, 2021
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1930	November 11, 2021
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3538	October 23, 2020
TRT Uses INT 32 VS INT 16 TensorRT	3	1002	October 12, 2021

Object Detection Inference Optimisation

Related topics