Jetson Nano 16bits vs 32 bits inference performance

Hello All,
I create a serialized inference model from Caffe object detection CNN using TensorRT ‘trtexec’ tool, I chose maxBatch=1 and fp16 for the first model and maxBatch=1 (32 bits) for the second.
When testing the two inferences I note that there are no differences in latency (FPS).
Can someone explain why there are no differences in performance between the 16bits model and the 32bits?
Thanks in advance

#Jetpack 4.6.2
#TensorRT 8.2.1
#CUDA 10.2
#Python 3

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.


Have you maximized the device’s performance?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Could you share the trtexec output log with us?
There should contain the qps information.