FP16 doesn't bring improvement to inference

I have successfully converted my ssd_resnet10 caffe trained model to tensorrt engine and I can see 5X speed up in inference time on my TX2 in compare to Corei7 CPU which is amazing! I just had couple of questions about FP16 and FP32 engine creation.

  1. Is it guaranteed that inference time will improve if FP16 is used over FP32? In my case I can not see any difference as inference time for both is 12ms.
    I use following to use FP16 when creating the engine
builder.fp16_mode = True
builder.strict_type_constraints = True
  1. I have used all 4 modes of nvpmodel -m followed by ./jetson_clocks and mode 3 had the lowest inference time but I read in different articles that mode 0 should make the best inference time. Is it logical that I can get 27FPS on on mode 3 and 19-22 FPS on all other modes?

I’m using TX2 and my tensor rt version is