TensorRT Inferencing using TF-TRT framework FP32 vs FP16

Hi,
I have started exploring TensorRT few months back, I have optimized custom trained keras classification model using TF-TRT python framework for both FP32 and FP16 precision modes. After conversion, model summary shows dtype as ‘float32’ for both Am I missing something? or all the layers cannot be optimized to FP16.
PFA for your reference.



Also as per this document : Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs - TensorFlow memory configurations can be used for TensorRT Inferencing, even after using those configurations there is no improvement. what could be the reason?
I even tried reducing workspace bytes while conversion, but there is no memory optimization.

Hi,

The calculation is in FP16 format but the input/output keeps FP32 since this is easy for users to feed or read the data.
A format convert layer will be added by default to handle the conversion.

Thanks.

okay, Thanks for the quick reply.
I still do not see any performance improvement, will check for few trials and get back to you if required.

Hi,

It looks like the accelerated ratio of fp32 and fp16 seems to be identical.

Not sure if there are any dependencies or overhead in the TF-TRT.
Are you able to use pure TensorRT (like trtexec) so we can give it a further check?

Thanks.

Hi,
I have not explored trtexec as was getting proper results and improvement with FP32 optimized models, will explore trtexec once.

Hi,

Please give it a try and let us know the following.
Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.