FP16 does not decrease inference time on Jetson Nano

anaR · August 9, 2022, 10:18am

I would like to speed up the inference using a Jetson Nano (for a yolo type model). With fp32 the average time would be 196 ms and when I decrease the precision to fp16 the inference time barely goes down (it saves 4ms which can be just an experiment difference).

I have converted the model from .onnx to .trt by doing:

usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --shapes=1x1536x1536x1 --fp16

I have also tried --int8 and it seems the Jetson Nano does not have native int8 support. I also could not find the GPU here hardware and precision .
Is this because the hardware specs show

128 NVIDIA CUDA® cores 0.5 TFLOPs (FP16) ?

spolisetty · August 9, 2022, 4:24pm

Hi,

We are moving this post to the Jetson Nano forum to get better help.

Thank you.

SivaRamaKrishnaNV · August 10, 2022, 4:33am

Dear @anaR,
Could you please share the model for reproducing the issue?

anaR · August 10, 2022, 11:31am

rndmodel.h5 (131.2 KB)
rndmodel.onnx (69.6 KB)
This is the model but with random weights.

SivaRamaKrishnaNV · August 16, 2022, 2:59pm

Dear @anaR,
I could repro the issue. In Performance mode, with JetPack-4.6.2. I can see ~10% improvement and not just 4msec(~2% in your case). I am investigating the issue for more insights.
Could you please dump the layer timings in both cases(If not using Jetpack 4.6.2) to see which layer is causing the issue.

SivaRamaKrishnaNV · August 22, 2022, 6:52am

Dear @anaR,
I could see there are couple of reformatting input function calls included in FP16 mode which is increasing over all execution time.

anaR · August 23, 2022, 7:53am

@SivaRamaKrishnaNV , thank you for you reply.
By reformatting, are you referring to the input layer? I did not add any casting function to the layers.
Also, I hope I understood correctly from other questions on the forum, the inference time cannot benefit from casting to int8 for the Jetson Nano which means a speed-up can only come from casting to float16.

Topic		Replies	Views
No performance improvement on Jetson Nano FP16 vs FP32 TensorRT	6	2770	February 22, 2021
TF/Keras inference 4 times faster with FP32 precision than with FP16 Jetson Nano	8	2763	October 18, 2021
Why inference in jetson nano with fp16 is slower than fp32 Jetson Nano tensorrt , jetson-inference	9	2064	September 5, 2021
Jetson Nano YoloV3 performance Jetson Nano	6	2708	October 18, 2021
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1315	August 27, 2020
Why jetson nano fp16 is slower than fp32 Jetson Nano jetson-inference	2	661	October 15, 2021
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1407	October 14, 2021
Yolov3 in nanojetson Jetson Nano tensorrt	12	1172	October 18, 2021
INT8 for jetson nano TAO Toolkit	4	543	October 12, 2021
resnext in nano jetson Jetson Nano	10	1217	October 14, 2021

FP16 does not decrease inference time on Jetson Nano

Related topics