I would like to speed up the inference using a Jetson Nano (for a yolo type model). With fp32 the average time would be 196 ms and when I decrease the precision to fp16 the inference time barely goes down (it saves 4ms which can be just an experiment difference).
I have converted the model from .onnx to .trt by doing:
I have also tried --int8 and it seems the Jetson Nano does not have native int8 support. I also could not find the GPU here hardware and precision .
Is this because the hardware specs show
Dear @anaR,
I could repro the issue. In Performance mode, with JetPack-4.6.2. I can see ~10% improvement and not just 4msec(~2% in your case). I am investigating the issue for more insights.
Could you please dump the layer timings in both cases(If not using Jetpack 4.6.2) to see which layer is causing the issue.
@SivaRamaKrishnaNV , thank you for you reply.
By reformatting, are you referring to the input layer? I did not add any casting function to the layers.
Also, I hope I understood correctly from other questions on the forum, the inference time cannot benefit from casting to int8 for the Jetson Nano which means a speed-up can only come from casting to float16.