Speed of FP32 vs FP16

Hi! I trained YOLO models and then converted them to FP32, FP16 in order to use with deepstream. It seems that there is no speed-up (at least on Jetson nano). Yeah, the engine (and the model) is smaller, but the speed is the same (with res=320,320 and interval=0 it’s about 18 FPS). What’s the reason for that? Is that because NMS only supports FP32 and it slows everything down?

How can I speed-up inference apart from using parameter interval in deepstream config (where I compute preds every frames)

How did you convert them to FP32 and FP16? With the command tlt-export or tlt-converter?

More, suggest generating trt engine and then use trtexec to test fps.
Refer to Accelerating Peoplnet with tlt for jetson nano

Well, first - with tlt-export on my desktop and then with tlt-converter on Jetson Nano.

Please note that

  1. after tlt-export, whatever fp32 or fp16, the etlt model is the same. (Difference in data type specified during tlt export and tlt convert)
  2. please use trtexec to test fps. Before testing, please generate tensorrt engine in Nano with the tool tlt-converter.