Hello All,
I create a serialized inference model from Caffe object detection CNN using TensorRT ‘trtexec’ tool, I chose maxBatch=1 and fp16 for the first model and maxBatch=1 (32 bits) for the second.
When testing the two inferences I note that there are no differences in latency (FPS).
Can someone explain why there are no differences in performance between the 16bits model and the 32bits?
Thanks in advance
#Jetpack 4.6.2
#TensorRT 8.2.1
#CUDA 10.2
#Python 3