FP16 does not decrease inference time on Jetson Nano

I would like to speed up the inference using a Jetson Nano (for a yolo type model). With fp32 the average time would be 196 ms and when I decrease the precision to fp16 the inference time barely goes down (it saves 4ms which can be just an experiment difference).

I have converted the model from .onnx to .trt by doing:

usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --shapes=1x1536x1536x1 --fp16

I have also tried --int8 and it seems the Jetson Nano does not have native int8 support. I also could not find the GPU here hardware and precision .
Is this because the hardware specs show

128 NVIDIA CUDA® cores 0.5 TFLOPs (FP16) ?

Hi,

We are moving this post to the Jetson Nano forum to get better help.

Thank you.

1 Like

Dear @anaR,
Could you please share the model for reproducing the issue?

rndmodel.h5 (131.2 KB)
rndmodel.onnx (69.6 KB)
This is the model but with random weights.

Dear @anaR,
I could repro the issue. In Performance mode, with JetPack-4.6.2. I can see ~10% improvement and not just 4msec(~2% in your case). I am investigating the issue for more insights.
Could you please dump the layer timings in both cases(If not using Jetpack 4.6.2) to see which layer is causing the issue.

Dear @anaR,
I could see there are couple of reformatting input function calls included in FP16 mode which is increasing over all execution time.

@SivaRamaKrishnaNV , thank you for you reply.
By reformatting, are you referring to the input layer? I did not add any casting function to the layers.
Also, I hope I understood correctly from other questions on the forum, the inference time cannot benefit from casting to int8 for the Jetson Nano which means a speed-up can only come from casting to float16.