Speed of FP32 vs FP16

Hi! I trained YOLO models and then converted them to FP32, FP16 in order to use with deepstream. It seems that there is no speed-up (at least on Jetson nano). Yeah, the engine (and the model) is smaller, but the speed is the same (with res=320,320 and interval=0 it’s about 18 FPS). What’s the reason for that? Is that because NMS only supports FP32 and it slows everything down?

How can I speed-up inference apart from using parameter interval in deepstream config (where I compute preds every frames)

How did you convert them to FP32 and FP16? With the command tlt-export or tlt-converter?

More, suggest generating trt engine and then use trtexec to test fps.
Refer to Accelerating Peoplnet with tlt for jetson nano - #13 by Morganh

Well, first - with tlt-export on my desktop and then with tlt-converter on Jetson Nano.

Please note that

  1. after tlt-export, whatever fp32 or fp16, the etlt model is the same. (Difference in data type specified during tlt export and tlt convert - #5 by Morganh)
  2. please use trtexec to test fps. Before testing, please generate tensorrt engine in Nano with the tool tlt-converter.