FP16 not even two times faster than using FP32 in TensorRT

edit_or · June 12, 2019, 2:48am

I used TensorRT and Tensorflow model is converted to TensorRT engines in FP16 and FP32 modes.

Tested with 10 images and FP32 is not even two times faster than FP16 mode. Expected minimum two times faster.

FP16
msec: 0.171075
msec: 0.134830
msec: 0.129984
msec: 0.128638
msec: 0.118196
msec: 0.123429
msec: 0.134329
msec: 0.119506
msec: 0.117615
msec: 0.127687

FP32
msec: 0.199235
msec: 0.180985
msec: 0.153394
msec: 0.148267
msec: 0.151481
msec: 0.169578
msec: 0.159987
msec: 0.173443
msec: 0.159301
msec: 0.155503

Is that result acceptable? Or what else I need to look into.

In this document on page 15http://www.serc.iisc.ac.in/serc_web/wp-content/uploads/2018/01/TENSORRT.pdf, there is 5 times images/sec difference between FP32 and FP16.

My GPU is Titan RTX.